- Spark Cookbook
- Rishi Yadav
- 178字
- 2021-07-16 13:44:01
Introduction
Spark provides a unified runtime for big data. HDFS, which is Hadoop's filesystem, is the most used storage platform for Spark as it provides cost-effective storage for unstructured and semi-structured data on commodity hardware. Spark is not limited to HDFS and can work with any Hadoop-supported storage.
Hadoop supported storage means a storage format that can work with Hadoop's InputFormat
and OutputFormat
interfaces. InputFormat
is responsible for creating InputSplits
from input data and piding it further into records. OutputFormat
is responsible for writing to storage.
We will start with writing to the local filesystem and then move over to loading data from HDFS. In the Loading data from HDFS recipe, we will cover the most common file format: regular text files. In the next recipe, we will cover how to use any InputFormat
interface to load data in Spark. We will also explore loading data stored in Amazon S3, a leading cloud storage platform.
We will explore loading data from Apache Cassandra, which is a NoSQL database. Finally, we will explore loading data from a relational database.
- JavaScript前端開發模塊化教程
- Effective C#:改善C#代碼的50個有效方法(原書第3版)
- NativeScript for Angular Mobile Development
- PHP 編程從入門到實踐
- 高級C/C++編譯技術(典藏版)
- 微信小程序項目開發實戰
- 全棧自動化測試實戰:基于TestNG、HttpClient、Selenium和Appium
- Unity&VR游戲美術設計實戰
- Clean Code in C#
- Java程序設計與項目案例教程
- Python商務數據分析(微課版)
- Natural Language Processing with Python Quick Start Guide
- INSTANT JQuery Flot Visual Data Analysis
- Offer來了:Java面試核心知識點精講(框架篇)
- Image Processing with ImageJ(Second Edition)