Intuition behind Gaussian Processes

If you own an oil company, your job is to drill for as much oil as possible while minimizing costs. Since the primary cost involves drilling the holes, your goal is to retrieve the maximum amount of oil per hole drilled. How, then, can you predict where the oil will be before you drill the holes? A divining rod, of course ... or maybe it is not possible. But maybe it is possible to drill a limited number of holes and use the information from those locations to make an optimal guess about where the biggest well is located.[1] This is one application of Gaussian processes, a complicated, but extremely powerful tool in math and statistics that we at SigOpt use to understand the world one experiment at a time.

Can we use Gaussian processes to predict the future? Maybe, but not all circumstances are appropriate for study with Gaussian processes. Consider a fair coin: no matter how many coin flips we observe, the probability of the next heads is still ½. This is because the experiments (flips) that we observe are independent: knowledge of one does not imply knowledge of others. Fortunately, many settings have a more helpful structure from which previous observations provide insight about unobserved outcomes. Imagine, for instance, you knew the temperature at 5 locations inside of an empty room; it is reasonable to assume (both from thermodynamics and common sense) that the temperature somewhere else in that room should be related to your observations. This is the setting of Figure 1.

Figure 1: This box represents a room; the dots represent five locations at which we have measured the temperature; the colored lines represent locations with identical temperatures like in a topographical map. Figure 1a: The “simplest” temperature profile of the room. Figure 1b: Another profile which still respects the data. Which would you think is more likely to represent the real world?

Once we stipulate that there is some relationship between observations, the inevitable question revolves around how they are related and how those observations can be digested to make accurate predictions. The assumption that our observations came from a Gaussian process is a very strong assumption, but it allows us great power to make provably optimal predictions. An added bonus is that we can compute the uncertainty of those predictions, which may be as important as making any prediction at all.

So what is a Gaussian process[2]? First off, it exists within some domain, and, although its official definition is rather abstract, it can be enough to think of a Gaussian process as a collection of random variables. At any location in the domain, the Gaussian process defines an expected value, and that expected value is our best prediction. Unfortunately, just saying that something exists does not make it useful; we must find that best prediction to, e.g., estimate how profitable a proposed well will be. Fortunately, Gaussian processes define not just an expected value but also a mechanism by which different points in the domain interact. We can study previous observations, e.g., oil wells that have already been drilled, to learn about this interaction and optimally predict the success of future wells.

Figure 2a: Given observations in red, we make our optimal prediction in dark blue; the area in light blue represents the uncertainty of our optimal prediction at various points. Figure 2b: At “a single input,” any value is possible, but values near the best prediction are most likely.

This is the story told in Figure 2, where we are considering a one dimensional domain in which wells have been drilled by previous companies (the red circles). Left and right represent a physical location (maybe latitude) and the height of a circle represents the profitability of the well. Given the circles, Figure 2a shows both the best prediction that we can make (the dark blue line) and a region of the most likely values (the light blue shaded area) where we predict with great certainty the profitability will lie. Now, suppose we have already bought the land at the dashed line and we want to know how profitable that well will be. Figure 2b shows the profitability of the well (left is less profitable, right is more profitable); higher values of the likelihood mean that those profitability values are more likely, and the best prediction the Gaussian process gives us is the most likely result.

Okay, so, we have the ability to take observations and make predictions, which is a common goal in, e.g., cartography during the creation of a topological relief map (see Figure 3). One of the early applications of this statistical predictive methodology was to produce a profile of gold deposits in South Africa given a limited number of actual measurements; today, many use the name kriging to remember Danie Krige, the pioneer of this strategy.

Figure 3: Physical quantities, such as height, can be represented with a Gaussian process. Here, heights from the Mt. Eden volcano[3] are predicted using observed measurements[4].

At SigOpt, we focus on a slightly different problem than simply understanding what our process looks like: we want to find the location at which this process is maximized. When drilling for oil this can consist of finding the latitude/longitude/depth where the most profitable well is expected. When implementing a machine learning algorithm, this can consist of optimizing the learning strategy so as to make the best recommendations for your customers; our client Sonica uses this to more accurately make song predictions for their users. When developing physical products, this can consist of choosing the chemical quantities which produce the desired product at the minimum cost; our client Advent Lab Group Northwest has used this strategy to better design shaving cream.

In each of these situations, expert knowledge is required to build the underlying system we are attempting to optimize, whether that is an oil rig, machine learning model, or cosmetic product. Once the system is built, finding the best variation of that system is often a non-intuitive process that is commonly performed inefficiently via brute force trial and error. In contrast, SigOpt leverages techniques like Gaussian Processes to provide a guided search through the complex space of possible parameters. This allows experts to build the next great model and apply their domain expertise instead of searching in the dark for the best experiment to run next. With SigOpt you can conquer this tedious, but necessary, element of development and unleash your experts on designing better products with less trial and error.

[1]: Obviously, the oil industry has advanced tools, such as sonar and geological surveys, for aiding predictions. This example emphasizes optimizing the use of minor exploratory wells as a supplement to, not replacement for, other practices.

[2]: Some statisticians, especially those from spatial statistics, prefer to reserve the term “process” for evolution in time, which is why the term Gaussian random field is also commonly used to describe the same concept.

[3]: Image courtesy of http://www.tedperkins.com/nz/C2Cmt_eden.htm

[4]: The data used in this example is from the R datasets package https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/volcano.html

Mike studies mathematical and statistical tools for interpolation and prediction. Prior to joining SigOpt, he spent time in the math and computer science division at Argonne National Laboratory and was a visiting assistant professor at the University of Colorado-Denver where he co-wrote a text on kernel-based approximation. Mike holds a PhD and MS in Applied Mathematics from Cornell and a BS in Applied Mathematics from Illinois Institute of Technology.
Mike McCourt, PhD