- Machine Learning with Spark(Second Edition)
- Rajdeep Dua Manpreet Singh Ghotra Nick Pentreath
- 222字
- 2021-07-09 21:07:48
Vectors in Spark
Spark MLlib uses Breeze and JBlas for internal linear algebraic operations. It uses its own class to represent a vector defined using the org.apache.spark.mllib.linalg.Vector factory. A local vector has integer-typed and 0-based indices. Its values are stored as double-typed. A local vector is stored on a single machine, and cannot be distributed. Spark MLlib supports two types of local vectors, dense and sparse, created using factory methods.
The following code snippet shows how to create basic sparse and dense vectors in Spark:
val dVectorOne: Vector = Vectors.dense(1.0, 0.0, 2.0)
println("dVectorOne:" + dVectorOne)
// Sparse vector (1.0, 0.0, 2.0, 3.0)
// corresponding to nonzero entries.
val sVectorOne: Vector = Vectors.sparse(4, Array(0, 2,3),
Array(1.0, 2.0, 3.0))
// Create a sparse vector (1.0, 0.0, 2.0, 2.0) by specifying its
// nonzero entries.
val sVectorTwo: Vector = Vectors.sparse(4, Seq((0, 1.0), (2, 2.0),
(3, 3.0)))
The preceding code produces the following output:
dVectorOne:[1.0,0.0,2.0]
sVectorOne:(4,[0,2,3],[1.0,2.0,3.0])
sVectorTwo:(4,[0,2,3],[1.0,2.0,3.0])
There are various methods exposed by Spark for accessing and discovering vector values as shown next:
val sVectorOneMax = sVectorOne.argmax
val sVectorOneNumNonZeros = sVectorOne.numNonzeros
val sVectorOneSize = sVectorOne.size
val sVectorOneArray = sVectorOne.toArray
val sVectorOneJson = sVectorOne.toJson
println("sVectorOneMax:" + sVectorOneMax)
println("sVectorOneNumNonZeros:" + sVectorOneNumNonZeros)
println("sVectorOneSize:" + sVectorOneSize)
println("sVectorOneArray:" + sVectorOneArray)
println("sVectorOneJson:" + sVectorOneJson)
val dVectorOneToSparse = dVectorOne.toSparse
The preceding code produces the following output:
sVectorOneMax:3
sVectorOneNumNonZeros:3
sVectorOneSize:4
sVectorOneArray:[D@38684d54
sVectorOneJson:{"type":0,"size":4,"indices":[0,2,3],"values":
[1.0,2.0,3.0]}
dVectorOneToSparse:(3,[0,2],[1.0,2.0])
推薦閱讀
- Python Artificial Intelligence Projects for Beginners
- Security Automation with Ansible 2
- 傳感器技術應用
- Mastering Elastic Stack
- 數據掘金
- Enterprise PowerShell Scripting Bootcamp
- Excel 2007技巧大全
- Photoshop CS5圖像處理入門、進階與提高
- 單片機技術項目化原理與實訓
- ZigBee無線通信技術應用開發
- 數據要素:全球經濟社會發展的新動力
- FANUC工業機器人配置與編程技術
- Java求職寶典
- x86/x64體系探索及編程
- 智能座艙之車載機器人交互設計與開發