- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 178字
- 2021-07-09 21:07:51
Gradient descent
An SGD implementation of gradient descent uses a simple distributed sampling of the data examples. Loss is a part of the optimization problem, and therefore, is a true sub-gradient.

This requires access to the full dataset, which is not optimal.

The parameter miniBatchFraction specifies the fraction of the full data to use. The average of the gradients over this subset

is a stochastic gradient. S is a sampled subset of size |S|= miniBatchFraction.
In the following code, we show how to use stochastic gardient descent on a mini batch to calculate the weights and the loss. The output of this program is a vector of weights and loss.
object SparkSGD {
def main(args: Array[String]): Unit = {
val m = 4
val n = 200000
val sc = new SparkContext("local[2]", "")
val points = sc.parallelize(0 until m,
2).mapPartitionsWithIndex { (idx, iter) =>
val random = new Random(idx)
iter.map(i => (1.0,
Vectors.dense(Array.fill(n)(random.nextDouble()))))
}.cache()
val (weights, loss) = GradientDescent.runMiniBatchSGD(
points,
new LogisticGradient,
new SquaredL2Updater,
0.1,
2,
1.0,
1.0,
Vectors.dense(new Array[Double](n)))
println("w:" + weights(0))
println("loss:" + loss(0))
sc.stop()
}
推薦閱讀
- Hands-On Intelligent Agents with OpenAI Gym
- 計算機應用
- Mastering VMware vSphere 6.5
- 反饋系統:多學科視角(原書第2版)
- 返璞歸真:UNIX技術內幕
- 自動檢測與傳感技術
- Visual Basic從初學到精通
- 大數據處理平臺
- PVCBOT機器人控制技術入門
- Docker on Amazon Web Services
- 自動化生產線安裝與調試(三菱FX系列)(第二版)
- Xilinx FPGA高級設計及應用
- Red Hat Enterprise Linux 5.0服務器構建與故障排除
- Eclipse全程指南
- SolarWinds Server & Application Monitor:Deployment and Administration