I. Basic math.
 II. Pricing and Hedging.
 III. Explicit techniques.
 IV. Data Analysis.
 1 Time Series.
 A. Time series forecasting.
 B. Updating a linear forecast.
 C. Kalman filter I.
 D. Kalman filter II.
 E. Simultaneous equations.
 2 Classical statistics.
 3 Bayesian statistics.
 V. Implementation tools.
 VI. Basic Math II.
 VII. Implementation tools II.
 VIII. Bibliography
 Notation. Index. Contents.

## Time series forecasting.

e are considering a problem of forecasting of a random variable based on information contained by some vector The is treated as a sample of some random variable that we also denote as .

Proposition

We evaluate the quality of the forecast by calculating the quadratic deviation of the forecast: Consequently, the optimal forecast is given by the conditional expectation

Proof

The forecast is some function of : . Therefore, we require that the function of interest would satisfy for any smooth function and a small real number , where the functional is defined by Thus, we have By properties of conditional expectation, The last equation holds for any smooth . Therefore, as claimed.

If the information is sufficiently large then it makes sense to look for among linear functions: . Repeating the above argument, we have with linear and .

We extend the above requirement to the case of a vector valued variable . We require that the function , , for some matrix , would be such that for any linear , the random quantities and are not correlated: We see that the forecasting operation is similar to a linear projection of on . This motivates introduction of the operation that takes two vector valued random variables and produces a deterministic matrix. We would like to treat this operation as a scalar product even though it does not have a full set of properties. We will construct a projection operation with the defining properties We proceed to verify that the projection may be defined by the straightforward adaptation of the formula from the elementary geometry on a subclass of random variables with zero mean: Indeed, and

From the two properties, verified here, all the other well known properties of orthogonal projection follow. This allows using of geometric intuition in the computation that follows.

The operation is well defined if the matrix is not degenerate. This is certainly so if the random variable has zero mean because then is the covariance matrix of . However, if is deterministic then is degenerate. Therefore, we represent a random variable as a sum where and . Observe that for two vector valued random variables and Therefore, for any . Hence, we extend the definition of to random variables with non-zero mean by orthogonality: Here we used the fact that the must satisfy with . The only way this can happen is if

Summary

The linear forecast of based on information is given by where , .

One can verify that this is a maximal likelihood forecast if and are jointly normal. To see this it is enough to use a jointly normal distribution function to compute a conditional distribution of . In the section ( Kalman filter II ) we will take such approach.

 Notation. Index. Contents.