目錄(118章)
倒序
- 封面
- 版權頁
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Why subscribe?
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Chapter 1. Installing Spark and Setting Up Your Cluster
- Directory organization and convention
- Installing the prebuilt distribution
- Building Spark from source
- Spark topology
- A single machine
- Running Spark on EC2
- Deploying Spark with Chef (Opscode)
- Deploying Spark on Mesos
- Spark on YARN
- Spark standalone mode
- References
- Summary
- Chapter 2. Using the Spark Shell
- The Spark shell
- Loading a simple text file
- Interactively loading data from S3
- Summary
- Chapter 3. Building and Running a Spark Application
- Building Spark applications
- Data wrangling with iPython
- Developing Spark with Eclipse
- Developing Spark with other IDEs
- Building your Spark job with Maven
- Building your Spark job with something else
- References
- Summary
- Chapter 4. Creating a SparkSession Object
- SparkSession versus SparkContext
- Building a SparkSession object
- SparkContext - metadata
- Shared Java and Scala APIs
- Python
- iPython
- Reference
- Summary
- Chapter 5. Loading and Saving Data in Spark
- Spark abstractions
- Data modalities
- Data modalities and Datasets/DataFrames/RDDs
- Loading data into an RDD
- Saving your data
- References
- Summary
- Chapter 6. Manipulating Your RDD
- Manipulating your RDD in Scala and Java
- Manipulating your RDD in Python
- References
- Summary
- Chapter 7. Spark 2.0 Concepts
- Code and Datasets for the rest of the book
- The data scientist and Spark features
- Spark v2.0 and beyond
- Apache Spark - evolution
- Apache Spark - the full stack
- The art of a big data store - Parquet
- References
- Summary
- Chapter 8. Spark SQL
- The Spark SQL architecture
- Spark SQL how-to in a nutshell
- Spark SQL programming
- References
- Summary
- Chapter 9. Foundations of Datasets/DataFrames – The Proverbial Workhorse for DataScientists
- Datasets - a quick introduction
- Dataset APIs - an overview
- Dataset interfaces and functions
- References
- Summary
- Chapter 10. Spark with Big Data
- Parquet - an efficient and interoperable big data format
- HBase
- Reference
- Summary
- Chapter 11. Machine Learning with Spark ML Pipelines
- Spark's machine learning algorithm table
- Spark machine learning APIs - ML pipelines and MLlib
- ML pipelines
- Spark ML examples
- The API organization
- Basic statistics
- Linear regression
- Classification
- Clustering
- Recommendation
- Hyper parameters
- The final thing
- References
- Summary
- Chapter 12. GraphX
- Graphs and graph processing - an introduction
- Spark GraphX
- GraphX - computational model
- The first example - graph
- Building graphs
- The GraphX API landscape
- Structural APIs
- Community affiliation and strengths
- Algorithms
- Partition strategy
- Case study - AlphaGo tweets analytics
- References
- Summary 更新時間:2021-08-20 10:27:33
推薦閱讀
- Learning C++ Functional Programming
- 技術領導力:程序員如何才能帶團隊
- Python網絡爬蟲從入門到實踐(第2版)
- C語言程序設計學習指導與習題解答
- Raspberry Pi Home Automation with Arduino(Second Edition)
- C語言開發基礎教程(Dev-C++)(第2版)
- Spring Boot實戰
- Unity&VR游戲美術設計實戰
- Learning Node.js for .NET Developers
- Zabbix Performance Tuning
- 平面設計經典案例教程:CorelDRAW X6
- Getting Started with Polymer
- C++程序設計教程
- 零基礎輕松學C++:青少年趣味編程(全彩版)
- MySQL 8從零開始學(視頻教學版)
- Java程序性能優化實戰
- C# 7.0核心技術指南(原書第7版)
- Hands-On Exploratory Data Analysis with Python
- Visual FoxPro程序設計教程
- Java語言案例教程
- Python:Penetration Testing for Developers
- MySQL 從入門到項目實踐(超值版)
- Learning pandas(Second Edition)
- ArcPy and ArcGIS:Geospatial Analysis with Python
- 代碼之外的功夫:程序員精進之路
- Practical Test:Driven Development using C# 7
- Android App開發進階與項目實戰
- Arduino iOS Blueprints
- HTML+CSS網頁開發技術精解
- 硅谷設計之道:探尋硅谷科技公司的體驗設計策略