- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 222字
- 2021-07-09 21:07:48
Vectors in Spark
Spark MLlib uses Breeze and JBlas for internal linear algebraic operations. It uses its own class to represent a vector defined using the org.apache.spark.mllib.linalg.Vector factory. A local vector has integer-typed and 0-based indices. Its values are stored as double-typed. A local vector is stored on a single machine, and cannot be distributed. Spark MLlib supports two types of local vectors, dense and sparse, created using factory methods.
The following code snippet shows how to create basic sparse and dense vectors in Spark:
val dVectorOne: Vector = Vectors.dense(1.0, 0.0, 2.0)
println("dVectorOne:" + dVectorOne)
// Sparse vector (1.0, 0.0, 2.0, 3.0)
// corresponding to nonzero entries.
val sVectorOne: Vector = Vectors.sparse(4, Array(0, 2,3),
Array(1.0, 2.0, 3.0))
// Create a sparse vector (1.0, 0.0, 2.0, 2.0) by specifying its
// nonzero entries.
val sVectorTwo: Vector = Vectors.sparse(4, Seq((0, 1.0), (2, 2.0),
(3, 3.0)))
The preceding code produces the following output:
dVectorOne:[1.0,0.0,2.0]
sVectorOne:(4,[0,2,3],[1.0,2.0,3.0])
sVectorTwo:(4,[0,2,3],[1.0,2.0,3.0])
There are various methods exposed by Spark for accessing and discovering vector values as shown next:
val sVectorOneMax = sVectorOne.argmax
val sVectorOneNumNonZeros = sVectorOne.numNonzeros
val sVectorOneSize = sVectorOne.size
val sVectorOneArray = sVectorOne.toArray
val sVectorOneJson = sVectorOne.toJson
println("sVectorOneMax:" + sVectorOneMax)
println("sVectorOneNumNonZeros:" + sVectorOneNumNonZeros)
println("sVectorOneSize:" + sVectorOneSize)
println("sVectorOneArray:" + sVectorOneArray)
println("sVectorOneJson:" + sVectorOneJson)
val dVectorOneToSparse = dVectorOne.toSparse
The preceding code produces the following output:
sVectorOneMax:3
sVectorOneNumNonZeros:3
sVectorOneSize:4
sVectorOneArray:[D@38684d54
sVectorOneJson:{"type":0,"size":4,"indices":[0,2,3],"values":
[1.0,2.0,3.0]}
dVectorOneToSparse:(3,[0,2],[1.0,2.0])
推薦閱讀
- Mastering Matplotlib 2.x
- 21天學通JavaScript
- 網絡服務器架設(Windows Server+Linux Server)
- Mastering Salesforce CRM Administration
- 現代機械運動控制技術
- JBoss ESB Beginner’s Guide
- Docker High Performance(Second Edition)
- INSTANT Autodesk Revit 2013 Customization with .NET How-to
- 21天學通C語言
- 網絡安全管理實踐
- 統計挖掘與機器學習:大數據預測建模和分析技術(原書第3版)
- HBase Essentials
- 電腦上網入門
- 典型Hadoop云計算
- 常用傳感器技術及應用(第2版)