To Model or not to Model

A reasonable fundamental question of numerical modeling is: should we do it for a specific problem?

This may seem a little odd, but believe me, modeling is a huge amount of work and we really need to answer this question, based on what we know, and what do we want to achieve, before we even start to think about any details.

I want make sure you realize that this is a very important question that must be answered precisely. There are several valid reasons for modeling:

You want fancy graphics for illustrative purposes, or to impress peers, bosses or yourself
You want to understand more details from/about a process/problem that you already understand well
You want to predict outputs from different inputs, in a process/problem that you understand very well
You have a very simple set of inputs and outputs and you want to test for possible variations in models and processes
And the number 1 scientific reason: you want to find out just how little you really know about a problem that you think you understand

There are many more cases in which models are very poor choices, note this is the general case in the geosciences:

You don't understand the problem, and you don't have much input or output data.
You understand the problem somewhat and need to predict responses well outside the known data
It is a complex problem (i.e. weather), but you don't have much data
It is a complex problem
And the number 1 scientific case: there are easier and better ways to get the result you need!

My first rule of modeling is to:	(1) only model when you already know the answer!	The reason for this we will talk about in class, but a big part of the reason is the second rule of modeling, namely that:
	(2) any (every?) model has errors	and that the larger the model or the more unknown the model space, the more likely the errors are to be important.
We can round this discussion out with my 3^rd and 4^th rules:	(3) the best use for models in the geosciences is to investigate and understand the real question, not to find the answer	After all, Science is mainly about understanding the questions (Engineering is about finding answers)
	(4) everyone should write one large model, but a smart scientist learns from that experience	(I leave it to you to figure out what should be learned)

Numerical Approaches

I want to make an important distinction between numerical modeling and numerical experiments. Researchers often carry out numerical or analytical experiments, such as “back of the envelope” calculations, or scaling arguments, or reduction of dimensions. An experiment is what I call an investigation where the researcher either doesn’t understand enough to model, or just wants to avoid the complexity of modeling, but knows enough about a problem to ask “what if” questions. The answers are used to shape our thinking, but can not be strongly defended (and often turn out later to be incorrect). Numerical experiments often use similar (often simpler) code compared to numerical models, but should represent a much greater spirit of investigation and usually require more mature scientific thinking. By learning how to do numerical models you will be in a much better position to be able to segue into numerical experiments, and thereby increase your productivity. In my world-view, numerical experiments are what scientists should aspire to, while most numerical modeling is best left to engineers.

Understanding the difference between modeling and experimenting, as well as between good and bad modeling is good insurance in this day and age. It allows you to protect yourself from the mass of scientific errors (quack, hacks?) that hide behind unsound models. As you can tell, bad modeling is a personal peeve of mine.

Models

I define a model as something that abstracts (mimics a simplified) reality and allows us to manipulate inputs, outputs and transfer functions that convert inputs into outputs. You can make virtually any type of model, from physical models, to analog models, to numerical models to whatever you feel like. And of course you can model just about anything. However, in this course we make a strong distinction between models that allow a strong degree of validation and those that don't. Models of stock market behavior, lotto etc. are outside this course because they are difficult (impossible) to check and besides they violate the first rule. (Numerical experiments can’t usually be validated either, but the actual results are only of passing interest, mainly you are interested in the process. However, you shouldn’t put too much faith in experiments either).

There are many ways you can approach modeling of the world. One way is to break problems into discrete and continuous, and to recognize that computers deal most easily with discrete problems. However; historically computers have not usually been used that way. The majority of problems are set up as a continuum, in other words they are assumed to be slowly varying smooth functions of space and time. This is a hold over from the world of continuous mathematics. Most of the math we learn is based on the concept of smooth curves and surfaces that are generally differentiable, i.e. they have a defined slope and are not, except at single points, kinked. The fact that we actually go through a process of taking the real world, cast it into continuous math and then try to model does not mean that this is the best approach. More and more, researchers are skipping the intermediate step of casting the real world into math, and jumping straight to cellular or inherently discrete models. The benefits are great in terms of speed and complexity, but the costs are huge in that we move further away from being able to prove our results to be correct. Although note that most proofs of any sort regarding numerical modeling are based on the underlying continuous math, not on the numerics.

Validation of models is itself a huge topic. Essentially it is impossible to validate a model of any complexity. This would require a vast amount of input and output data spanning the entire range of model response, which essentially makes the model moot.

However, if we restrict ourselves to models that can be spot checked over at least part of the range, we find that models generally require three essentials:

A kernel that describes the behavior
A domain in which the kernel operates
A full set of boundaries and conditions on those boundaries

In a steady state model (one that doesn't change with time), the boundary conditions (BCs) don't change. In a transient problem (one that does change in time), the boundary conditions may change, and in addition we must specify starting conditions or boundaries referred to as initial conditions (ICs).

A Few Comments on the Elements of Models

Many beginning modelers underestimate the absolute necessity of BCs and ICs, and the over-riding need to fully specify the domain. In fact, in most physical models, the boundary conditions are more important than the kernel. This is a result of the fact that in most physical processes, the values of the boundary conditions propagate into the interior of the problem with time. A corollary of the above is that modeling requires a lot of data on the domain and the BCs to get accurate results. Typically, errors in output stem as much or more from a lack of data than from errors in the model itself.

When you go to a talk, and are impressed by the sophistication of the model, and the pretty output graphics, be sure to ask yourself: was there really enough data at the input to reach this output uniquely? (Mark Twain said it better in his famous quote).

The Kernel

Most models of geo problems start with trying to describe some physical process in some mathematical form. There are a variety of different approaches, but one of the oldest and most successful approaches has been to look for a differential (or integral-differential) equation whose response mimics the physical system. Although this step is often glossed over, it is actually a difficult step to make in any new process. As can be seen in the note (and our work) on heat flow, one usually makes numerous simplifying assumptions in producing the mathematical equation. It is very easy to lose sight of these assumptions after the model is made, and it is very difficult to know which assumptions can be justified while making the model, and which won't become important later on.

To a large extent, after you have settled on a kernel equation, and assembled your data on BCs and the solution domain, you have very little thinking left to do, merely a lot of hard work. When most people think of modeling, they think of this final part of the problem. However, most of the thinking should have gone into the first part.

Solving the kernel equation

If you have ended up with a differential equation as your kernel, you are lucky. There are many different solution techniques. Two of the largest classes of solvers are those based on Finite Differences and those based on Finite Elements. In Finite Difference techniques, you construct an approximate numerical description of the kernel equation, and then solve that approximation exactly. In Finite Element techniques, you preserve (in theory) in exact equation, but seek solutions that approximate it in some sense. F.E. methods are generally somewhat better (faster and easier to focus in on a small sub-region), however F.D. methods are generally easier to code initially and are often used in preliminary work and non-production code.

If you don’t end up with a simple equation in the kernel, then you need to be creative. We will investigate a couple of problems that don’t lend themselves to pretty differential equation solutions.