捕鱼免费版

書名： Scala for Machine Learning（Second Edition）
作者名： Patrick R. Nicolas
本章字數： 3444字
更新時間： 2021-07-08 10:43:08

The discrete Kalman filter

The Kalman filter is a mathematical model that provides an accurate and recursive computation approach to estimate the previous states and predict the future states of a process for which some variables may be unknown. R.E. Kalman introduced it in the early 1960s to model dynamics systems and predict trajectory in aerospace [3:11]. Today, the Kalman filter is used to discover a relationship between two observed variables that may or may not be associated with other hidden variables. In this respect, the Kalman filter shares some similarities with the hidden Markov model (HMM) described in the Hidden Markov model section of Chapter 7, Sequential Data Models [3:12].

The Kalman filter is:

A predictor of the next data point from the current observation
A filter that weeds out noise by processing the last two observations
A smoothing model that identifies trends from a history of observations

Note

Smoothing versus filtering:

Smoothing is an operation that removes high-frequency fluctuations from a time series or signal. Filtering consists of selecting a range of frequencies to process the data. In this regard, smoothing is somewhat similar to low-pass filtering. The only difference is that a low-pass filter is usually implemented through linear methods.

Conceptually, the Kalman filter estimates the state of a system from noisy observations. The Kalman filter has two characteristics:

Recursive: A new state is predicted and corrected using the input of a previous state
Optimal: This is an optimal estimator because it minimizes the mean square error of the estimated parameters (against actual values)

The Kalman filter is one of the stochastic models that are used in adaptive control [3:13].

Note

Kalman and non-linear systems:

The Kalman filter estimates the internal state of a linear dynamic system. However, it can be extended to model non-linear state space using linear or quadratic approximation functions. These filters are known as, you guessed it, Extended Kalman Filters (EKFs), the theory of which is beyond the scope of this book.

The following section is dedicated to discrete Kalman filters for linear systems, as applied to financial engineering. A continuous signal can be converted to a time series using the Nyquist frequency.

The state space estimation

The Kalman filter model consists of two core elements of a dynamic system - a process that generates data and a measurement that collects data. These elements are referred to as the state space model. Mathematically speaking, the state space model consists of two equations:

Transition equation: This describes the dynamic of the system including the unobserved variables
Measurement equation: This describes the relationship between the observed and unobserved variables

The transition equation

Let's consider a system with a linear state x_t of n variables and a control input vector u_t. The prediction of the state at time t is computed by a linear stochastic equation (M12):

A_t is the square matrix of dimension n that represents the transition from state x_t-1 at t-1 to state x_t at t. The matrix is intrinsic to the dynamic system under consideration.
B_t is an n by n matrix that describes the control input model (external action on the system or model). It is applied to the control vector u_t.
w_t represents the noise generated by the system or from a probabilistic point of view the uncertainty on the model. It is known as the process white noise.

The control input vector represents the external input (or control) to the state of the system. Most systems, including our financial example later in the chapter, have no external input to the state of the model.

Note

White and Gaussian noise:

A white noise is a Gaussian noise, following a normal distribution with zero mean.

The measurement equation

The measurement of m values zt of the state of the system is defined by the following equation (M13):

H_t is a matrix m by n that models the dependency of the measurement to the state of the system.
v_t is the white noise introduced by the measuring devices. Similarly to the process noise, v follows a Gaussian distribution with zero mean and a variance R, known as the measurement noise covariance.

Note

Time dependency model:

We cannot assume that the parameters of the generalized discrete Kalman filter such as state transition At, control input Bt, and observation (or measurement dependency) Ht matrices are independent from time. However, these parameters are constant in most practical applications.

The recursive algorithm

The set of equations for the discrete Kalman filter is implemented as a recursive computation with two distinct steps:

The algorithm uses the transition equations to estimate the next observation
The estimation is created with the actual measurement for this observation

The recursion is visualized in the following diagram:

Overview diagram of the recursive Kalman algorithm

Let's illustrate the prediction and correction phases in the context of filtering financial data, in a manner similar to the moving average and Fourier transform. The objective is to extract the trend and the transitory component of the yield of the 10-year treasury bond. The Kalman filter is particularly suitable to the analysis of interest rates for two reasons:

Yields are the results of multiple factors, some of which are not directly observable.
Yields are influenced by the policy of the Federal Reserve that can be easily modeled by the control matrix.

The 10-year Treasury bond has a higher trading volume than bonds with longer maturity, making the trend in interest rates a bit more reliable [3:14].

Applying the Kalman filter to clean raw data requires you to define a model that encompasses both observed and non-observed states. In the case of the trend analysis, we can safely create our model with a two-variable state: the current yield xt and the previous yield x_t-1.

Note

State of dynamic systems:

The term "state" refers to the state of the dynamic system under consideration, not the state of the execution of the algorithm.

This implementation of the Kalman filter uses the Apache Commons Math library. Therefore, we need to specify the implicit conversion from our primitives introduced in the Primitives and implicits section of Chapter 1, Getting Started to Apache Commons Math types RealMatrix, RealVector, Array2DRowRealMatrix, and ArrayRealVector:

type DblMatrix = Array[Array[Double]]

implicit def double2RealMatrix(x: DblMatrix): RealMatrix = 
    new Array2DRowRealMatrix(x)
implicit def double2RealRow(x: DblVec): RealMatrix = 
    new Array2DRowRealMatrix(x)
implicit def double2RealVector(x: DblVec): RealVector = 
    new ArrayRealVector(x)

The client code has to import the implicit conversion functions within its scope.

The Kalman model assumes that process and measurement noise follows a Gaussian distribution, also known as white noise. For the sake of maintainability, the generation of the white noise is encapsulated in the class QRNoise with the following arguments (line 1):

qr: Tuple of scale factors for the process noise matrix Q and the measurement noise R
profile: Noise profile with the normal distribution as default

The two methods, noiseQ and noiseR, generate an array of two independent white noise elements (line 2):

val normal = Stats.normal(_)

class QRNoise(qr: DblPair,profile: Double=>Double = normal){ //1 
  def q = profile(qr._1)
  def r = profile(qr._2)
  lazy val noiseQ = Array[Double](q, q)   //2
  lazy val noiseR = Array[Double](r, r)
}

Tip

Experimenting with noise profile:

Although the discrete Kalman filter assumes the noise profile follows a normal distribution, the class QRNoise allows the user to experiment with different noise profiles.

The easiest approach to manage the matrices and vectors used in the recursion is to define them as arguments of a configuration class kalmanConfig. The arguments of the configuration follow the naming convention defined in the mathematical formulas: A is the state transition matrix, B is the control matrix, H is the matrix of observations defining the dependencies between measurement and system state, and P is the covariance error matrix:

case class KalmanConfig(A: DblMatrix, B: DblMatrix, 
                       H: DblMatrix, P: DblMatrix)
  type U = Vector[DblPair]  //3
  type V = Vector[DblPair]  //4

Let us implement the Kalman filter as a transformation DKalman of type ETransform on a time series with a predefined configuration KalmanConfig:

class DKalman(config: KalmanConfig)(implicit qrNoise: QRNoise) 
    extends ETransform[U, V](config) {
  type KRState = (KalmanFilter, RealVector)  //5
  override def |> : PartialFunction[U, Try[V]]
   ...
}

As with any explicit data transformation, we need to specify the type U and V (lines 3 and 4) that are identical: the Kalman filter does not alter the structure of the data, only the values. We define an internal state for the Kalman computation, KRState, by creating a tuple of two Apache Commons Math types, KalmanFilter and RealVector (line 5).

The key elements of the filter are now in place and it's time to implement the prediction-correction cycle portion of the Kalman algorithm.

Prediction

The prediction phase consists of estimating the state x (yield of the Treasury bond) using the transition equation. We assume that the Federal Reserve has no material effect on the interest rates, making control input matrix B null. The transition equation can be easily resolved using simple operations on matrices:

Visualization of the transition equation of the Kalman filter

The purpose of this exercise is to evaluate the impact of the different parameters of the transition matrix A in terms of smoothing.

Note

The control input matrix B:

In this example, the control matrix B is null because there is no known, deterministic external action on the yield of the 10-year treasury bond. However, the yield can be affected by unknown parameters that we represent as hidden variables. The matrix B would be used to model the decision of the Federal Reserve regarding asset purchases and federal fund rates, for example.

The mathematics behind the Kalman filter presented as reference to the implementation in Scala use the same notation for matrices and vectors. It is not a prerequisite to understand the Kalman filter and its implementation in the next section. If you have a natural inclination toward linear algebra, the next paragraph describes the two equations for the prediction step.

The prediction step is defined as follows:

The prediction of the state at time t is computed by extrapolating the state estimate:

A is the square matrix of dimension n that represents the transition from state x at t-1 to state x at time t.

x't is the predicted state of the system based on the current state and the model A.

B is the vector of n dimension that describes the input to the state.

The mean square error matrix P, which is to be minimized, is updated through the following formula:

Where:

AT is the transpose of the state transition matrix.

Q is the process white noise described as a Gaussian distribution with a zero mean and a variance Q, known as the noise covariance.

The state transition matrix is implemented using the matrix and vector classes included in the Apache Common Math library. The types of matrices and vectors are automatically converted into RealMatrix and RealVector classes.

The implementation of the equation M14 is as follows:

x = A.operate(x).add(qrNoise.create(0.03, 0.1))

The new state is predicted (or estimated), and then used as input to the correction step.

Correction

The second step of the recursive Kalman algorithm is the correction of the estimated yield of the 10-year Treasury bond with the actual yield. In this example, the white noise of the measurement is negligible. The measurement equation is simple because the state is represented by the current and previous yield, and their measurement z.

Visualization of the measurement equation of the Kalman filter

The sequence of mathematical equations of the correction phase consists of updating the estimation of the state x using the actual values z and computing the Kalman gain K.

The correction step is defined as follows:

M16: The state of the system x is estimated from the actual measurement z through the formula:

Where:

r_t is the residual between the predicted measurements and the actual measured values

K_t is the Kalman gain for the correction factor

M17: The Kalman gain is computed as follows:

Here, H_T is the matrix transpose of H and P_t' is the estimate of the error covariance.

Kalman smoothing

It is time to put our knowledge of the transition and measurement equations to the test. The Apache common library defines two classes, DefaultProcessModel and DefaultMeasurementModel, to encapsulate the components of the matrices and vectors. The historical values for the yield of the 10-year Treasury bond are loaded through the DataSource method and mapped to a smoothed series that is the output of the filter:

  override def |> : PartialFunction[U, Try[V]] = {
    case xt: U if( !xt.isEmpty) => Try( 
      xt.map { case(current, prev) => {
        val models = initialize(current, prev) //6
        val nState = newState(models) //7
        (nState(0), nState(1))  //8
      }}
    ) 
   }

The data transformation for the Kalman filter initializes the process and measurement model for each data point in the private method initialize (line 6), update the state using the transition and correction equations iteratively in the method newState (line 7), and return the filtered series of pair values (line 8).

Tip

Exception handling:

The code to catch and process exceptions thrown by the Apache Commons Math library is omitted as standard practice in the book. As far as the execution of the Kalman filter is concerned, the following exceptions have to be handled:

NonSquareMatrixException
DimensionMismatchException
MatrixDimensionMismatchException

The method initialize encapsulates the initialization of the process model, pModel (line 9) and measurement (observations dependencies) model, mModel (line 10) as defined in the Apache Common Math library:

def initialize(current: Double, prev: Double): KRState = {  
  val pModel = new DefaultProcessModel(
    config.A, config.B, Q, input, config.P
  ) //9
  val mModel = new DefaultMeasurementModel(config.H, R) //10
  val in = Array[Double](current, prev)
  (new KalmanFilter(pModel,mModel), new ArrayRealVector(in))
}

The exceptions thrown by the Apache Commons Math API are caught and processed through a Try instance. The iterative prediction and correction of the smoothed yields of 10-year Treasury bond is implemented by the newState method. The method iterates through the following steps:

Estimate the new values of the state by invoking an Apache Commons Math KalmanFilter.predict method that implements the formula M14 (line 11).
Apply the formula M12 to the new state x at time t (line 12).
Compute the measured value z at time t using the M13 formula (line 13).
Invoke the Apache Commons Math KalmanFilter.correct method to implement the formula M16 (line 14).

Return the estimate value for the state x by invoking the Apache Commons Math KalmanFilter.getStateEstimation method (line 15):

def newState(state: KRState):  Array[Double] = {
  state._1.predict  //11
  val x = config.A.operate(state._2).add(qrNoise.noisyQ) //12
  val z = config.H.operate(x).add(qrNoise.noisyR) //13
  state._1.correct(z)  //14
  state._1.getStateEstimation  //15
}

Tip

Exit condition:

In the code snippet for the method newState, the iteration for specific data points exits when the maximum number of iterations is reached. A more elaborate implementation consists of either evaluating the matrix P at each iteration or estimation converged within a predefined range.

Fixed lag smoothing

So far, we have studied the Kalman filtering algorithm. We need to adapt it to the smoothing of time series. The fixed lag smoothing technique consists of backward correcting the previous data point taking into account the latest actual value.

An N-lag smoother defines the input as a vector X = {x_t-N-1 , x_t-N-2 , ... x_t } for which the value xt-N-j is corrected taking into account the current value of xt.

The strategy is quite similar to the hidden Markov model forward and backward passes (refer to the Evaluation section under Hidden Markov model (HMM) in Chapter 7, Sequential Data Models).

Note

Complex strategies for lag smoothing:

There are numerous formulas or methodologies to implement an accurate fixed lag smoothing strategy and correct the predicted observations. Such strategies are beyond the scope of this book.

Experimentation

The objective is to smooth the yield of the 10-year Treasury bond using a two-step lag smoothing algorithm.

Note

Two-step lag smoothing:

M18: 2-step lag smoothing algorithm for state St using a single smoothing factor α:

The state equation updates the values of the state [x_t, x_t-1] using the previous state [x_t-1, x_t-2] where x_t represents the yield of the 10-year Treasury bond at time t. This is accomplished by shifting the values of the original time series {x0 … x_n-1} by 1 using the drop method: X1={x₁ … x_n-1}, creating a copy of the original time series without the last element X2={x₀ … x_n-2}, and zipping X₁ and X₂. This process is implemented by the zipWithShift method introduced in the first section of the chapter.

The resulting sequence of state vector S_k = [x_k, x_k-1]T is processed by the Kalman algorithm:

Import YahooFinancials._ 
val RESOURCE_DIR = ?resources/data/filtering/?
implicit val qrNoise = new QRNoise((0.7, 0.3)) //16

val H: DblMatrix = ((0.9, 0.0), (0.0, 0.1))    //17
val P0: DblMatrix = ((0.4, 0.3), (0.5, 0.4))   //18
val ALPHA1 = 0.5; val ALPHA2 = 0.8

(src.get(adjClose)).map(zt => {  //19
   twoStepLagSmoother(zt, ALPHA1)     //20
   twoStepLagSmoother(zt, ALPHA2)
})

Tip

Implicit noise instance:

The noise for the process and measurement is defined as an implicit argument to the Kalman filter, DKalman, for two reasons:

The profile of the noise is specific to the process or system under evaluation and its measurement. It is independent from the Kalman configuration parameters A, B, and H. Therefore, it cannot be a member of the KalmanConfig class.
The same noise characteristics should be shared with other alternative filtering techniques, if needed.

The white noise for the process and the measurement are initialized implicitly with the value qrNoise (line 16). The code initializes the matrices H of the measurement dependencies on the state (line 17) and P0 containing the initial covariance errors (line 18). The input data is extracted from a CSV file containing the daily Yahoo financial data (line 19). Finally, the method executes the two-step lag smoothing algorithm, twoStepLagSmoother, with two different alpha parameter values, ALPHA1 and ALPHA 2 (line 20).

Let's consider the twoStepLagSmoother method:

def twoStepLagSmoother(zSeries: DblVec,alpha: Double): Int = { 
  val A: DblMatrix = ((alpha, 1.0-alpha), (1.0, 0.0))  //21
  val xt = zipWithShift(1)  //22
  val pfnKalman = DKalman(A, H, P0) |>    //23
  pfnKalman(xt).map(filtered =>          //24
    display(zSeries, filtered.map(_._1), alpha) )
}

The method twoStepLagSmoother takes two arguments:

A single variable time series zSeries
A state transition parameter alpha

It initializes the state transition matrix A using the exponential moving average decay parameter alpha (line 21). It creates the two-step lag time series xt using the zipWithShift method (line 22). It extracts the partial function, pfnKalman (line 23), processes, and finally displays the two-step lag time series (line 24).

Tip

Modeling state transition and noise:

The state transition and the noise related to the process have to be selected carefully. The resolution of the state equations relies on the Cholesky (QR) decomposition, which requires a non-negative definite matrix. The implementation in the Apache common library throws a NonPositiveDefiniteMatrixException if the principle is violated.

The smoothed yield is plotted along the raw data as follows:

Output of the Kalman filter for 10-year Treasury-Bond historical prices

The Kalman filter can smooth the historical yield of the 10-year Treasury bond while preserving the spikes and lower frequency noise.

Let's analyze the data for a shorter period during which the noise is the strongest, between the 190th and the 275th trading days:

Output of Kalman filter for the 10-year Treasury bond prices 0.8-.02

The high-frequency noise has been significantly reduced without canceling the actual spikes. The distribution 0.8, 0.2 takes into consideration the previous state and favors the predicted value. Contrarily, a run with a state transition matrix A [0.2, 0.8, 0.0, 1.0] that favors the latest measurement will preserve the noise, as seen in the following graph:

Output of Kalman filter for the 10-year Treasury bond price 0.2-0.8

Benefits and drawbacks

The Kalman is a very useful and powerful tool in understanding the distribution of the noise between the process and observation. Contrary to the low- or band-pass filters based on the discrete Fourier transform, the Kalman filter does not require the computation of the frequencies spectrum or make any assumption on the range of frequencies of the noise.

However, the linear discrete Kalman filter has its limitations:

The noise generated by both the process and the measurement has to be Gaussian. Processes with non-Gaussian noise can be modeled with techniques such as Gaussian sum filter or adaptive Gaussian mixture [3:15].
It requires that the underlying process is linear. However, researchers have been able to formulate extensions to the Kalman filter, known as the extended Kalman filter (EKF) to filter signals from non-linear dynamic systems, at the cost of significant computational complexity.

Note

Continuous-time Kalman filter:

The Kalman filter is not restricted to dynamic systems with discrete states, x. The case of continuous state-time is handled by modifying the state transition equation so the estimated state is computed as derivative dx/dt.

官术网_书友最值得收藏!

Scala for Machine Learning（Second Edition）

The discrete Kalman filter

Note

Note

The state space estimation

The transition equation

Note

The measurement equation

Note

The recursive algorithm

Note

Tip

Prediction

Note

Correction

Kalman smoothing

Tip

Tip

Fixed lag smoothing

Note

Experimentation

Note

Tip

Tip

Benefits and drawbacks

Note