目錄(274章)
倒序
- coverpage
- Title Page
- Credits
- Foreword
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Errata
- Piracy
- Questions
- Introduction to Spark
- Dimensions of big data
- What makes Hadoop so revolutionary?
- Defining HDFS
- NameNode
- HDFS I/O
- YARN
- Processing the flow of application submission in YARN
- Overview of MapReduce
- Why Apache Spark?
- RDD - the first citizen of Spark
- Operations on RDD
- Lazy evaluation
- Benefits of RDD
- Exploring the Spark ecosystem
- What's new in Spark 2.X?
- References
- Summary
- Revisiting Java
- Why use Java for Spark?
- Generics
- Creating your own generic type
- Interfaces
- Static method in an interface
- Default method in interface
- What if a class implements two interfaces which have default methods with same name and signature?
- Anonymous inner classes
- Lambda expressions
- Functional interface
- Syntax of Lambda expressions
- Lexical scoping
- Method reference
- Understanding closures
- Streams
- Generating streams
- Intermediate operations
- Working with intermediate operations
- Terminal operations
- Working with terminal operations
- String collectors
- Collection collectors
- Map collectors
- Groupings
- Partitioning
- Matching
- Finding elements
- Summary
- Let Us Spark
- Getting started with Spark
- Spark REPL also known as CLI
- Some basic exercises using Spark shell
- Checking Spark version
- Creating and filtering RDD
- Word count on RDD
- Finding the sum of all even numbers in an RDD of integers
- Counting the number of words in a file
- Spark components
- Spark Driver Web UI
- Jobs
- Stages
- Storage
- Environment
- Executors
- SQL
- Streaming
- Spark job configuration and submission
- Spark REST APIs
- Summary
- Understanding the Spark Programming Model
- Hello Spark
- Prerequisites
- Common RDD transformations
- Map
- Filter
- flatMap
- mapToPair
- flatMapToPair
- union
- Intersection
- Distinct
- Cartesian
- groupByKey
- reduceByKey
- sortByKey
- Join
- CoGroup
- Common RDD actions
- isEmpty
- collect
- collectAsMap
- count
- countByKey
- countByValue
- Max
- Min
- First
- Take
- takeOrdered
- takeSample
- top
- reduce
- Fold
- aggregate
- forEach
- saveAsTextFile
- saveAsObjectFile
- RDD persistence and cache
- Summary
- Working with Data and Storage
- Interaction with external storage systems
- Interaction with local filesystem
- Interaction with Amazon S3
- Interaction with HDFS
- Interaction with Cassandra
- Working with different data formats
- Plain and specially formatted text
- Working with CSV data
- Working with JSON data
- Working with XML Data
- References
- Summary
- Spark on Cluster
- Spark application in distributed-mode
- Driver program
- Executor program
- Cluster managers
- Spark standalone
- Installation of Spark standalone cluster
- Start master
- Start slave
- Stop master and slaves
- Deploying applications on Spark standalone cluster
- Client mode
- Cluster mode
- Useful job configurations
- Useful cluster level configurations (Spark standalone)
- Yet Another Resource Negotiator (YARN)
- YARN client
- YARN cluster
- Useful job configuration
- Summary
- Spark Programming Model - Advanced
- RDD partitioning
- Repartitioning
- How Spark calculates the partition count for transformations with shuffling (wide transformations )
- Partitioner
- Hash Partitioner
- Range Partitioner
- Custom Partitioner
- Advanced transformations
- mapPartitions
- mapPartitionsWithIndex
- mapPartitionsToPair
- mapValues
- flatMapValues
- repartitionAndSortWithinPartitions
- coalesce
- foldByKey
- aggregateByKey
- combineByKey
- Advanced actions
- Approximate actions
- Asynchronous actions
- Miscellaneous actions
- Shared variable
- Broadcast variable
- Properties of the broadcast variable
- Lifecycle of a broadcast variable
- Map-side join using broadcast variable
- Accumulators
- Driver program
- Summary
- Working with Spark SQL
- SQLContext and HiveContext
- Initializing SparkSession
- Reading CSV using SparkSession
- Dataframe and dataset
- SchemaRDD
- Dataframe
- Dataset
- Creating a dataset using encoders
- Creating a dataset using StructType
- Unified dataframe and dataset API
- Data persistence
- Spark SQL operations
- Untyped dataset operation
- Temporary view
- Global temporary view
- Spark UDF
- Spark UDAF
- Untyped UDAF
- Type-safe UDAF:
- Hive integration
- Table Persistence
- Summary
- Near Real-Time Processing with Spark Streaming
- Introducing Spark Streaming
- Understanding micro batching
- Getting started with Spark Streaming jobs
- Streaming sources
- fileStream
- Kafka
- Streaming transformations
- Stateless transformation
- Stateful transformation
- Checkpointing
- Windowing
- Transform operation
- Fault tolerance and reliability
- Data receiver stage
- File streams
- Advanced streaming sources
- Transformation stage
- Output stage
- Structured Streaming
- Recap of the use case
- Structured streaming - programming model
- Built-in input sources and sinks
- Input sources
- Built-in Sinks
- Summary
- Machine Learning Analytics with Spark MLlib
- Introduction to machine learning
- Concepts of machine learning
- Datatypes
- Machine learning work flow
- Pipelines
- Operations on feature vectors
- Feature extractors
- Feature transformers
- Feature selectors
- Summary
- Learning Spark GraphX
- Introduction to GraphX
- Introduction to Property Graph
- Getting started with the GraphX API
- Using vertex and edge RDDs
- From edges
- EdgeTriplet
- Graph operations
- mapVertices
- mapEdges
- mapTriplets
- reverse
- subgraph
- aggregateMessages
- outerJoinVertices
- Graph algorithms
- PageRank
- Static PageRank
- Dynamic PageRank
- Triangle counting
- Connected components
- Summary 更新時間:2021-07-02 19:02:35
推薦閱讀
- Learning Real-time Processing with Spark Streaming
- Vue.js前端開發基礎與項目實戰
- Learning SAP Analytics Cloud
- 前端架構:從入門到微前端
- 微信小程序開發解析
- Java程序設計入門
- 金融商業數據分析:基于Python和SAS
- Learning Dynamics NAV Patterns
- Python程序設計:基礎與實踐
- 深入理解Android:WebKit卷
- Spark Streaming技術內幕及源碼剖析
- Metasploit for Beginners
- Python語言基礎
- Vue.js 3.0從入門到精通(視頻教學版)
- R語言入門與實踐
- Mastering PhoneGap Mobile Application Development
- Linux C編程從入門到精通(“十二五”國家重點圖書出版規劃項目)
- 跨平臺的移動Web開發實戰(HTML5+CSS3)
- Mastering Microservices with Java
- Python網絡編程(原書第2版)
- Python數據科學導論
- 數據分析與挖掘:R語言
- 機器學習數學基礎一本通(Python版)
- Go程序員面試算法寶典
- INSTANT PrimeFaces Starter
- Predictive Analytics Using Rattle and Qlik Sense
- 軟件開發實踐:項目驅動式的Java開發指南
- 玩轉Django 2.0
- 程序是怎樣跑起來的
- Python從入門到精通(第2版)