書名： Machine Learning with Spark（Second Edition）
作者名： Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
本章字數： 178字
更新時間： 2021-07-09 21:07:51

Gradient descent

An SGD implementation of gradient descent uses a simple distributed sampling of the data examples. Loss is a part of the optimization problem, and therefore, is a true sub-gradient.

This requires access to the full dataset, which is not optimal.

The parameter miniBatchFraction specifies the fraction of the full data to use. The average of the gradients over this subset

is a stochastic gradient. S is a sampled subset of size |S|= miniBatchFraction.

In the following code, we show how to use stochastic gardient descent on a mini batch to calculate the weights and the loss. The output of this program is a vector of weights and loss.

object SparkSGD { 
 def main(args: Array[String]): Unit = { 
    val m = 4 
    val n = 200000 
    val sc = new SparkContext("local[2]", "") 
    val points = sc.parallelize(0 until m, 
      2).mapPartitionsWithIndex { (idx, iter) => 
      val random = new Random(idx) 
      iter.map(i => (1.0, 
       Vectors.dense(Array.fill(n)(random.nextDouble())))) 
    }.cache() 
    val (weights, loss) = GradientDescent.runMiniBatchSGD( 
      points, 
      new LogisticGradient, 
      new SquaredL2Updater, 
      0.1, 
      2, 
      1.0, 
      1.0, 
      Vectors.dense(new Array[Double](n))) 
    println("w:"  + weights(0)) 
    println("loss:" + loss(0)) 
    sc.stop() 
  }

官术网_书友最值得收藏!

Machine Learning with Spark（Second Edition）

Gradient descent