- Hands-On Big Data Analytics with PySpark
- Rudy Lai Bart?omiej Potaczek
- 96字
- 2021-06-24 15:52:33
Getting Your Big Data into the Spark Environment Using RDDs
Primarily, this chapter will provide a brief overview of how to get your big data into the Spark environment using resilient distributed datasets (RDDs). We will be using a wide array of tools to interact with and modify this data so that useful insights can be extracted. We will first load the data on Spark RDDs and then carry out parallelization with Spark RDDs.
In this chapter, we will cover the following topics:
- Loading data onto Spark RDDs
- Parallelization with Spark RDDs
- Basics of RDD operation
推薦閱讀
- 劍破冰山:Oracle開發藝術
- PySpark大數據分析與應用
- 數據化網站運營深度剖析
- 揭秘云計算與大數據
- Enterprise Integration with WSO2 ESB
- UDK iOS Game Development Beginner's Guide
- 智能數據時代:企業大數據戰略與實戰
- 白話大數據與機器學習
- INSTANT Apple iBooks How-to
- 達夢數據庫運維實戰
- IPython Interactive Computing and Visualization Cookbook(Second Edition)
- 跨領域信息交換方法與技術(第二版)
- 一本書講透Elasticsearch:原理、進階與工程實踐
- Filecoin原理與實現
- 改進的群智能算法及其應用