In this post, we use the DynaML Scala machine learning environment to train Gaussian Process models to analyse time series data taken from a coal power plant.

### The Data Set

From the Daisy system identification database, we download the abott power plant data. The data characteristics are summarized below.

Description: The data comes from a model of a Steam Generator at Abbott Power Plant in Champaign IL.

Sampling Frequency: 3 sec

Number: 9600

Inputs: 1. Fuel scaled 0-1 2. Air scaled 0-1 3. Reference level inches 4. Disturbance definde by the load level

Outputs: 5. Drum pressure PSI 6. Excess Oxygen in exhaust gases % 7. Level of water in the drum 8. Steam Flow Kg./s

### Nonlinear AutoRegressive with eXogenous inputs (NARX)

A candidate output signal $y(t)$ modeled as a function of the previous $p$ values of itself and the $m$ exogenous inputs $u_{1}, \cdots u_{m}$

### Gaussian Processes

Gaussian Processes are powerful non-parametric methods to solve regression and classification problems. They are based on a structural assumption about the finite dimensional distributions over spaces of functions, as shown in the equations below.

#### Posterior Predictive Distribution

In the presence of training data $X = (x_1, x_2, \cdot , x_n) \ y = (y_1, y_2, \cdot , y_n)$, one may calculate using Bayes Theorem the posterior predictive distribution $\mathbf{f_*}|X,\mathbf{y},X_*$ assuming $X_*$, the test inputs are known.

For an in depth treatment of Gaussian Processes refer to the book.

## Modelling Power Plant Outputs

### Drum pressure PSI

AbottPowerPlant(new PolynomialKernel(2, 0.49), new DiracKernel(0.09),
opt = Map("globalOpt" -> "GS", "grid" -> "4", "step" -> "0.004"),
num_training = 200, num_test = 1000, deltaT = 2, column = 5)

### Excess Oxygen in exhaust gases (as %)

AbottPowerPlant(new PolynomialKernel(2, 0.49), new DiracKernel(0.09),
opt = Map("globalOpt" -> "GS", "grid" -> "4", "step" -> "0.004"),
num_training = 200, num_test = 1000, deltaT = 2, column = 6)

### Level of water in the drum

AbottPowerPlant(new PolynomialKernel(2, 0.49), new DiracKernel(0.09),
opt = Map("globalOpt" -> "GS", "grid" -> "4", "step" -> "0.004"),
num_training = 200, num_test = 1000, deltaT = 2, column = 7)

### Steam Flow Kg./s

AbottPowerPlant(new PolynomialKernel(2, 0.49), new DiracKernel(0.09),
opt = Map("globalOpt" -> "GS", "grid" -> "4", "step" -> "0.004"),
num_training = 200, num_test = 1000,
deltaT = 2, column = 8)

## Source Code

Below is the example program as a github gist, to view the original program in DynaML, click here.