首頁 > 計(jì)算機(jī)網(wǎng)絡(luò) >
編程語言與程序設(shè)計(jì)
> Mastering Apache Spark 2.x(Second Edition)最新章節(jié)目錄
目錄(235章)
倒序
- coverpage
- Title Page
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- A First Taste and What’s New in Apache Spark V2
- Spark machine learning
- Spark Streaming
- Spark SQL
- Spark graph processing
- Extended ecosystem
- What's new in Apache Spark V2?
- Cluster design
- Cluster management
- Local
- Standalone
- Apache YARN
- Apache Mesos
- Cloud-based deployments
- Performance
- The cluster structure
- Hadoop Distributed File System
- Data locality
- Memory
- Coding
- Cloud
- Summary
- Apache Spark SQL
- The SparkSession--your gateway to structured data processing
- Importing and saving data
- Processing the text files
- Processing JSON files
- Processing the Parquet files
- Understanding the DataSource API
- Implicit schema discovery
- Predicate push-down on smart data sources
- DataFrames
- Using SQL
- Defining schemas manually
- Using SQL subqueries
- Applying SQL table joins
- Using Datasets
- The Dataset API in action
- User-defined functions
- RDDs versus DataFrames versus Datasets
- Summary
- The Catalyst Optimizer
- Understanding the workings of the Catalyst Optimizer
- Managing temporary views with the catalog API
- The SQL abstract syntax tree
- How to go from Unresolved Logical Execution Plan to Resolved Logical Execution Plan
- Internal class and object representations of LEPs
- How to optimize the Resolved Logical Execution Plan
- Physical Execution Plan generation and selection
- Code generation
- Practical examples
- Using the explain method to obtain the PEP
- How smart data sources work internally
- Summary
- Project Tungsten
- Memory management beyond the Java Virtual Machine Garbage Collector
- Understanding the UnsafeRow object
- The null bit set region
- The fixed length values region
- The variable length values region
- Understanding the BytesToBytesMap
- A practical example on memory usage and performance
- Cache-friendly layout of data in memory
- Cache eviction strategies and pre-fetching
- Code generation
- Understanding columnar storage
- Understanding whole stage code generation
- A practical example on whole stage code generation performance
- Operator fusing versus the volcano iterator model
- Summary
- Apache Spark Streaming
- Overview
- Errors and recovery
- Checkpointing
- Streaming sources
- TCP stream
- File streams
- Flume
- Kafka
- Summary
- Structured Streaming
- The concept of continuous applications
- True unification - same code same engine
- Windowing
- How streaming engines use windowing
- How Apache Spark improves windowing
- Increased performance with good old friends
- How transparent fault tolerance and exactly-once delivery guarantee is achieved
- Replayable sources can replay streams from a given offset
- Idempotent sinks prevent data duplication
- State versioning guarantees consistent results after reruns
- Example - connection to a MQTT message broker
- Controlling continuous applications
- More on stream life cycle management
- Summary
- Apache Spark MLlib
- Architecture
- The development environment
- Classification with Naive Bayes
- Theory on Classification
- Naive Bayes in practice
- Clustering with K-Means
- Theory on Clustering
- K-Means in practice
- Artificial neural networks
- ANN in practice
- Summary
- Apache SparkML
- What does the new API look like?
- The concept of pipelines
- Transformers
- String indexer
- OneHotEncoder
- VectorAssembler
- Pipelines
- Estimators
- RandomForestClassifier
- Model evaluation
- CrossValidation and hyperparameter tuning
- CrossValidation
- Hyperparameter tuning
- Winning a Kaggle competition with Apache SparkML
- Data preparation
- Feature engineering
- Testing the feature engineering pipeline
- Training the machine learning model
- Model evaluation
- CrossValidation and hyperparameter tuning
- Using the evaluator to assess the quality of the cross-validated and tuned model
- Summary
- Apache SystemML
- Why do we need just another library?
- Why on Apache Spark?
- The history of Apache SystemML
- A cost-based optimizer for machine learning algorithms
- An example - alternating least squares
- ApacheSystemML architecture
- Language parsing
- High-level operators are generated
- How low-level operators are optimized on
- Performance measurements
- Apache SystemML in action
- Summary
- Deep Learning on Apache Spark with DeepLearning4j and H2O
- H2O
- Overview
- The build environment
- Architecture
- Sourcing the data
- Data quality
- Performance tuning
- Deep Learning
- Example code – income
- The example code – MNIST
- H2O Flow
- Deeplearning4j
- ND4J - high performance linear algebra for the JVM
- Deeplearning4j
- Example: an IoT real-time anomaly detector
- Mastering chaos: the Lorenz attractor model
- Deploying the test data generator
- Deploy the Node-RED IoT Starter Boilerplate to the IBM Cloud
- Deploying the test data generator flow
- Testing the test data generator
- Install the Deeplearning4j example within Eclipse
- Running the examples in Eclipse
- Run the examples in Apache Spark
- Summary
- Apache Spark GraphX
- Overview
- Graph analytics/processing with GraphX
- The raw data
- Creating a graph
- Example 1 – counting
- Example 2 – filtering
- Example 3 – PageRank
- Example 4 – triangle counting
- Example 5 – connected components
- Summary
- Apache Spark GraphFrames
- Architecture
- Graph-relational translation
- Materialized views
- Join elimination
- Join reordering
- Examples
- Example 1 – counting
- Example 2 – filtering
- Example 3 – page rank
- Example 4 – triangle counting
- Example 5 – connected components
- Summary
- Apache Spark with Jupyter Notebooks on IBM DataScience Experience
- Why notebooks are the new standard
- Learning by example
- The IEEE PHM 2012 data challenge bearing dataset
- ETL with Scala
- Interactive exploratory analysis using Python and Pixiedust
- Real data science work with SparkR
- Summary
- Apache Spark on Kubernetes
- Bare metal virtual machines and containers
- Containerization
- Namespaces
- Control groups
- Linux containers
- Understanding the core concepts of Docker
- Understanding Kubernetes
- Using Kubernetes for provisioning containerized Spark applications
- Example--Apache Spark on Kubernetes
- Prerequisites
- Deploying the Apache Spark master
- Deploying the Apache Spark workers
- Deploying the Zeppelin notebooks
- Summary 更新時(shí)間:2021-07-02 18:56:09
推薦閱讀
- HornetQ Messaging Developer’s Guide
- ServiceNow Application Development
- R語言數(shù)據(jù)分析從入門到精通
- Delphi程序設(shè)計(jì)基礎(chǔ):教程、實(shí)驗(yàn)、習(xí)題
- OpenCV 3和Qt5計(jì)算機(jī)視覺應(yīng)用開發(fā)
- Wireshark Network Security
- Learning SQLite for iOS
- 云計(jì)算通俗講義(第3版)
- Learning Apache Kafka(Second Edition)
- VMware虛擬化技術(shù)
- Extending Puppet(Second Edition)
- JavaScript程序設(shè)計(jì)(第2版)
- Learning AWS
- Hands-On Kubernetes on Windows
- 零基礎(chǔ)學(xué)C語言(第4版)
- OpenCV 3.0 Computer Vision with Java
- Python 3快速入門與實(shí)戰(zhàn)
- C語言程序設(shè)計(jì)教程
- R語言與網(wǎng)站分析
- Java EE互聯(lián)網(wǎng)輕量級(jí)框架整合開發(fā):SSM+Redis+Spring微服務(wù)(上下冊(cè))
- 接口自動(dòng)化測(cè)試持續(xù)集成:Postman+Newman+Git+Jenkins+釘釘
- Instant PhoneGap
- Microsoft 365 and SharePoint Online Cookbook
- Getting Started with Beautiful Soup
- C# 6 and .NET Core 1.0:Modern Cross:Platform Development
- Web前端開發(fā)精品課 HTML CSS JavaScript基礎(chǔ)教程
- PHP+MySQL動(dòng)態(tài)網(wǎng)頁設(shè)計(jì)
- Team Foundation Server 2015 Customization
- Swift 4 Programming Cookbook
- Modern Web Development with ASP.NET Core 3