- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 178字
- 2021-07-09 21:07:51
Gradient descent
An SGD implementation of gradient descent uses a simple distributed sampling of the data examples. Loss is a part of the optimization problem, and therefore, is a true sub-gradient.

This requires access to the full dataset, which is not optimal.

The parameter miniBatchFraction specifies the fraction of the full data to use. The average of the gradients over this subset

is a stochastic gradient. S is a sampled subset of size |S|= miniBatchFraction.
In the following code, we show how to use stochastic gardient descent on a mini batch to calculate the weights and the loss. The output of this program is a vector of weights and loss.
object SparkSGD {
def main(args: Array[String]): Unit = {
val m = 4
val n = 200000
val sc = new SparkContext("local[2]", "")
val points = sc.parallelize(0 until m,
2).mapPartitionsWithIndex { (idx, iter) =>
val random = new Random(idx)
iter.map(i => (1.0,
Vectors.dense(Array.fill(n)(random.nextDouble()))))
}.cache()
val (weights, loss) = GradientDescent.runMiniBatchSGD(
points,
new LogisticGradient,
new SquaredL2Updater,
0.1,
2,
1.0,
1.0,
Vectors.dense(new Array[Double](n)))
println("w:" + weights(0))
println("loss:" + loss(0))
sc.stop()
}
推薦閱讀
- Hadoop 2.x Administration Cookbook
- 一本書(shū)玩轉(zhuǎn)數(shù)據(jù)分析(雙色圖解版)
- Cloud Analytics with Microsoft Azure
- 自動(dòng)檢測(cè)與傳感技術(shù)
- 數(shù)據(jù)挖掘?qū)嵱冒咐治?/a>
- 機(jī)器自動(dòng)化控制器原理與應(yīng)用
- 樂(lè)高機(jī)器人—槍械武器庫(kù)
- 網(wǎng)中之我:何明升網(wǎng)絡(luò)社會(huì)論稿
- 從零開(kāi)始學(xué)C++
- Java Deep Learning Projects
- SolarWinds Server & Application Monitor:Deployment and Administration
- 網(wǎng)絡(luò)設(shè)備規(guī)劃、配置與管理大全(Cisco版)
- Hadoop大數(shù)據(jù)開(kāi)發(fā)基礎(chǔ)
- Cassandra Design Patterns
- R Data Visualization Recipes