财神国际开户

書名： Scala for Machine Learning（Second Edition）
作者名： Patrick R. Nicolas
本章字數： 1878字
更新時間： 2021-07-08 10:43:09

Expectation-Maximization (EM)

The EM was originally introduced to estimate the maximum likelihood in the case of incomplete data [4:7]. The EM algorithm is an iterative method to compute the model features that maximize the likely estimate for observed values, considering unobserved values.

The iterative algorithm consists of computing:

The expectation, E, of the maximum likelihood for the observed data by inferring the latent values (E-step)
The model features that maximize the expectation E (M-step)

The EM algorithm is applied to solve clustering problems by if each latent variable follows a Normal or Gaussian distribution. This is similar to the K-means algorithm for which the distance of each data point to the center of each cluster follows a Gaussian distribution [4:8]. Therefore, a set of latent variables is a mixture of Gaussian distributions.

Gaussian mixture model

Latent variables, Z_i can be visualized as the behavior (or symptoms) of a model (observed), X, for which Z are the root-cause of the behavior:

Visualization of observed and latent features

The latent values Z follow a Gaussian distribution. For the statisticians among us, the mathematics of a mixture model are described here:

Note

Maximization of log likelihood

M7: If x = {x_i } is a set of observed features associated with latent features z = {z_i }, the probability for the feature xi of observation x given a model parameter θ is:

M8: The objective is to maximize the likelihood L(θ):

EM overview

As far as the implementation is concerned, the Expectation-Maximization algorithm can be broken down into three stages:

The computation of the log likelihood for the model features given some latent variables (Initial-step).
The computation of the expectation of the log likelihood at iteration t (E-step).
The maximization of the expectation at iteration t (M-step).

Note

E-step

M9: The expectation Q of the complete data log likelihood for the model parameters θ n at iteration n is computed using the posterior distribution of latent variable z, p(z|x, θ) and the joint probability of the observation and the latent variable:

M-step

M10: The expectation function EM overview is maximized for the model features θ to compute the model parameters θ n+1 for the next iteration:

A formal, detailed but short mathematical formulation of the EM algorithm can be found in the S. Borman's tutorial [4:9].

Implementation

Let's implement the three steps (Initial-step, E-step, and M-step) in Scala. The internal calculations of the EM algorithm are a bit complex and overwhelming. You may not benefit much from the details of a specific implementation such as computation of the eigenvalues of the covariance matrix of the expectation of the log likelihood. This implementation hides some complexity by using the Apache Common Math library package [4:10].

Tip

Inner workings of EM

You may want to download the source code for the implementation of the EM algorithm in the Apache Commons Math library, if you need to understand the condition for which an exception is thrown.

The expectation-maximization algorithm of type MultivariateEM is implemented as a data transformation of type ITransform as described in the Monadic data transformation section of Chapter 2, Hello World!. The two arguments of the constructors are the number of clusters (or gauss distribution) K and the training set xt (line 12). The constructor initializes the type V of output as MCluster (line 21):

type V = EMCluster //1

class MultivariateEM[@specialized(Double) T: ToDouble](
  K: Int,
  xt: Vector[Array[T]]
) extends ITransform[Array[T], V] with Monitor[T] { //2

  val model: Option[EMModel] = train //3
  override def |> : PartialFunction[Arraon[U, Try[V]]
}

The multivariate expectation-maximization class has a model that consists of a list of EM clusters of type EMCluster. The Monitor trait is used to collect profiling information during training (refer to the Monitor section under Helper classes in the Appendix).

The information about an EM cluster, EMCluster, is defined by a key, the centroid, or means value and the density of the cluster that is the standard deviation of the distance of all the data points to the mean (line 4):

case class EMCluster(
   key: Double,
   means: Array[Double],
   density: Array[Double])  //4
type EMModel = List[EMCluster]

The implementation of the EM algorithm in the method train uses the Apache Commons Math MultivariateNormalMixture for the Gaussian mixture model and MultivariateNormalMixtureExpectationMaximization for the EM algorithm:

def train: Option[EMModel ] = Try {
  val data: DblMatrix = xt.map(
    _.map(implicitly[ToDouble[T]].apply(_))
  ).toArray //5

  val multivariateEM = new EM(data)
  multivariateEM.fit( estimate(data, K) ) //6

  val newMixture = multivariateEM.getFittedModel  //7
  val components = newMixture.getComponents.toList  //8
  components.map(p => EMCluster(
    p.getKey, 
    p.getValue.getMeans,
    p.getValue.getStandardDeviations)
  ) //9
} match {/* … */}

Let's look at the main method train of the wrapper class, MultivariateEM. The first step is to convert the time series into a primitive Matrix of Double with observations, with historical quotes as rows and the stock symbols as columns.

The time series xt of type Vector[Array[T]] is converted to a DblMatrix through an induced implicit conversion (line 5).

The initial mixture of Gaussian distributions can be provided by the user or can be extracted from the dataset, estimate (line 6). The getFittedModel triggers the M-step (line 7).

Note

Conversion from Java and Scala collections

Java primitives need to be converted to Scala types using the package import scala.collection.JavaConversions. For example, java.util.List is converted to scala.collection.immutable.List by invoking the method asScalaIterator of the class WrapAsScala, one of the base traits of JavaConversions.

The Apache Commons Math method, getComponents, returns a java.util.List that is converted to a scala.collection.immutable.List by invoking the method toList (line 8). Finally, the data transform returns a list of cluster information of type EMCluster (line 9).

Tip

Third-party library exceptions

Scala does not enforce declaration of exception as part of the signature of a method. Therefore, there is no guarantee that all types of exceptions will be caught, locally. This problem occurs when exceptions are thrown from a third-party library in two scenarios:

The documentation of the API does not list all the types of exceptions
The library is updated and a new type of exception is added to a method
One easy workaround is to leverage the Scala exception handling mechanism:

  Try { … }
    match {
      case Success(results) => …
      case Failure(excetion)  => ...
  }

Classification

The classification of new observations or data points is implemented by the method |>:

override def |> : PartialFunction[Array[T], Try[V]] = {
  case x:Array[T] if(isModel && x.length == dimension(xt)) =>
  Try( model.map(_.minBy(c => 
    euclidean[Double, Double](
      c.means, 
      x.map(z => implicitly[ToDOuble[T]].apply(z)))
    )
  ).getOrElse(.means,x))).get)
}

The method |> is similar to the KMeans.|> classifier.

Testing

Let's apply the MultivariateEM class to the clustering of the same 127 stocks used in evaluating the K-means algorithm.

As discussed in the paragraph related to the curse of dimensionality, the number of stocks (127) to analyze restricts the number of observations to be used by the EM algorithm. A simple option is to filter out some of the noise of the stocks' prices and apply a simple sampling method. The maximum sampling rate is restricted by the frequencies in the spectrum of noises of different types on the historical price of every stock.

Tip

Filtering and sampling

The pre-processing of the data using a combination of a simple moving average and fixed interval sampling prior to clustering is very rudimentary in this example. For instance, we cannot assume that the historical prices of all the stocks share the same noise characteristics. The noise pattern in the price of momentum and heavily traded stocks is certainly different from blue chips securities with a strong ownership, held by large mutual funds.

The sampling rate should take into account the frequencies spectrum of the noise. It should be set as at least twice the frequency of the noise with the lowest frequency.

The object of the test is to evaluate the impact of the sampling rate, samplingRate, and the number of clusters K used in the EM algorithm:

val K = 4; val period = 8
val smAve = SimpleMovingAverage[Double](period) //10
val pfnSmAve = smAve |>    //11

val obs = symbolFiles.map(sym => (
  val pfnSrc = DataSource(sym, path, true, 1) |>
  for {
    xs <- pfnSrc(extractor) //12
    if(pfnSmAve.isDefined)
      values <- pfnSmAve(xs.head)  //13
    y <- Try {   
      values.view.indices.drop(period+1).toVector
          .filter( _ % samplingRate == 0)
          .map(values(_)).toArray //14
    }
  } yield y).getOrElse(Array.empty[Double]))

em(K, obs)  //15

The first step is to create a simple moving average with a predefined period (line 10) as described in the Simple moving average section of Chapter 3, Data Pre-Processing . The test code instantiates the partial function, pfnSmAve that implements the moving average computation (line 11). The symbols of the stocks under consideration are extracted from the names of the files in the path directory (line 12).

The execution of the moving average (line 13) generates a set of smoothed values that are sampled given a sampling rate, samplingRate (line 14). Finally, the expectation maximization is instantiated to cluster the sampled data in the em method (line 15):

def em(K: Int, obs: DblMatrix): Int = {
  val em = MultivariateEM[Double](K, obs.toVector) //16
  show(s"${em.toString}")  //17
}

The method em instantiates the EM algorithm for a specific number of clusters K (line 16). The content of the model is displayed by invoking MultivariateEM.toString. The results are aggregated and then displayed in a textual format on the standard output (line 17).

The first test is to execute the EM algorithm with K=3 clusters and a sampling period of 10 on data smoothed by a simple moving average with a period 8. The sampling of historical prices of the 127 stocks between January 1, 2013 and December 31, 2013 with a frequency of 0.1 Hertz produces 24 data points. The following chart displays the mean of each of the three clusters:

Chart of the normalized means per cluster using EM K=3

The mean vectors of clusters 2 and 3 have similar patterns, which may suggest that a set of three clusters is accurate enough to provide us with a first insight into the similarity within groups of stocks. The following is a chart of the normalized standard deviation per cluster using EM K = 3:

Chart of the normalized standard deviation per cluster using EM K=3

The distribution of the standard deviation along the mean vector of each cluster can be explained by the fact that the price of stocks from a couple of industries went down in synergy while others went up as a semi-homogenous group following the announcement from the Federal Reserve that the monthly quantity of bonds purchased as part of the quantitative easing program will be reduced soon.

Note

Relation to K-Means

You may wonder what the relation is between EM and K-means, as both techniques address the same problem. The K-means algorithm assigns each observation uniquely to one and only one cluster. The EM algorithm assigns an observation based on posterior probability. K-means is a special case of the EM for Gaussian mixtures [4:11].

Online EM

Online learning is a powerful strategy for training a clustering model when dealing with very large datasets. This strategy has regained interest from scientists lately. The description of online EM is beyond the scope of this tutorial. However, you may need to know that there are several algorithms to online EM available if you ever have to deal with large datasets: batch EM, stepwise EM, incremental EM, and Monte Carlo EM [4:12].

官术网_书友最值得收藏!

Scala for Machine Learning（Second Edition）

Expectation-Maximization (EM)

Gaussian mixture model

Note

EM overview

Note

Implementation

Tip

Note

Tip

Classification

Testing

Tip

Note

Online EM