舉報

會員
Mastering Machine Learning with Spark 2.x
最新章節:
Summary
Areyouadeveloperwithabackgroundinmachinelearningandstatisticswhoisfeelinglimitedbythecurrentslowand“smalldata”machinelearningtools?Thenthisisthebookforyou!Inthisbook,youwillcreatescalablemachinelearningapplicationstopoweramoderndata-drivenbusinessusingSpark.WeassumethatyoualreadyknowthemachinelearningconceptsandalgorithmsandhaveSparkupandrunning(whetheronaclusterorlocally)andhaveabasicknowledgeofthevariouslibrariescontainedinSpark.
目錄(183章)
倒序
- cover
- Title Page
- Copyright
- Mastering Machine Learning with Spark 2.x
- Credits
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Introduction to Large-Scale Machine Learning and Spark
- Data science
- The sexiest role of the 21st century – data scientist?
- A day in the life of a data scientist
- Working with big data
- The machine learning algorithm using a distributed environment
- Splitting of data into multiple machines
- From Hadoop MapReduce to Spark
- What is Databricks?
- Inside the box
- Introducing H2O.ai
- Design of Sparkling Water
- What's the difference between H2O and Spark's MLlib?
- Data munging
- Data science - an iterative process
- Summary
- Detecting Dark Matter - The Higgs-Boson Particle
- Type I versus type II error
- Finding the Higgs-Boson particle
- The LHC and data creation
- The theory behind the Higgs-Boson
- Measuring for the Higgs-Boson
- The dataset
- Spark start and data load
- Labeled point vector
- Data caching
- Creating a training and testing set
- What about cross-validation?
- Our first model – decision tree
- Gini versus Entropy
- Next model – tree ensembles
- Random forest model
- Grid search
- Gradient boosting machine
- Last model - H2O deep learning
- Build a 3-layer DNN
- Adding more layers
- Building models and inspecting results
- Summary
- Ensemble Methods for Multi-Class Classification
- Data
- Modeling goal
- Challenges
- Machine learning workflow
- Starting Spark shell
- Exploring data
- Missing data
- Summary of missing value analysis
- Data unification
- Missing values
- Categorical values
- Final transformation
- Modelling data with Random Forest
- Building a classification model using Spark RandomForest
- Classification model evaluation
- Spark model metrics
- Building a classification model using H2O RandomForest
- Summary
- Predicting Movie Reviews Using NLP and Spark Streaming
- NLP - a brief primer
- The dataset
- Dataset preparation
- Feature extraction
- Feature extraction method– bag-of-words model
- Text tokenization
- Declaring our stopwords list
- Stemming and lemmatization
- Featurization - feature hashing
- Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme
- Let's do some (model) training!
- Spark decision tree model
- Spark Naive Bayes model
- Spark random forest model
- Spark GBM model
- Super-learner model
- Super learner
- Composing all transformations together
- Using the super-learner model
- Summary
- Word2vec for Prediction and Clustering
- Motivation of word vectors
- Word2vec explained
- What is a word vector?
- The CBOW model
- The skip-gram model
- Fun with word vectors
- Cosine similarity
- Doc2vec explained
- The distributed-memory model
- The distributed bag-of-words model
- Applying word2vec and exploring our data with vectors
- Creating document vectors
- Supervised learning task
- Summary
- Extracting Patterns from Clickstream Data
- Frequent pattern mining
- Pattern mining terminology
- Frequent pattern mining problem
- The association rule mining problem
- The sequential pattern mining problem
- Pattern mining with Spark MLlib
- Frequent pattern mining with FP-growth
- Association rule mining
- Sequential pattern mining with prefix span
- Pattern mining on MSNBC clickstream data
- Deploying a pattern mining application
- The Spark Streaming module
- Summary
- Graph Analytics with GraphX
- Basic graph theory
- Graphs
- Directed and undirected graphs
- Order and degree
- Directed acyclic graphs
- Connected components
- Trees
- Multigraphs
- Property graphs
- GraphX distributed graph processing engine
- Graph representation in GraphX
- Graph properties and operations
- Building and loading graphs
- Visualizing graphs with Gephi
- Gephi
- Creating GEXF files from GraphX graphs
- Advanced graph processing
- Aggregating messages
- Pregel
- GraphFrames
- Graph algorithms and applications
- Clustering
- Vertex importance
- GraphX in context
- Summary
- Lending Club Loan Prediction
- Motivation
- Goal
- Data
- Data dictionary
- Preparation of the environment
- Data load
- Exploration – data analysis
- Basic clean up
- Useless columns
- String columns
- Loan progress columns
- Categorical columns
- Text columns
- Missing data
- Prediction targets
- Loan status model
- Base model
- The emp_title column transformation
- The desc column transformation
- Interest RateModel
- Using models for scoring
- Model deployment
- Stream creation
- Stream transformation
- Stream output
- Summary 更新時間:2021-07-02 18:46:37
推薦閱讀
- HTML5+CSS3+JavaScript從入門到精通:上冊(微課精編版·第2版)
- C++案例趣學
- 嵌入式軟件系統測試:基于形式化方法的自動化測試解決方案
- Python for Secret Agents:Volume II
- 零基礎學Scratch少兒編程:小學課本中的Scratch創意編程
- 精通Linux(第2版)
- Java高并發核心編程(卷1):NIO、Netty、Redis、ZooKeeper
- Python語言科研繪圖與學術圖表繪制從入門到精通
- 多媒體技術及應用
- Instant Apache Camel Messaging System
- 大數據時代的企業升級之道(全3冊)
- Python Social Media Analytics
- Android嵌入式系統程序開發(基于Cortex-A8)
- C語言程序設計實驗指導與習題精解
- Java程序性能優化實戰
- 計算機應用基礎
- 自己動手做智能產品:嵌入式JavaScript實現
- Python網絡運維自動化
- C#編程魔法書
- Mastering Puppet(Second Edition)
- Hands-On Machine Learning with ML.NET
- 軟件測試技術實戰:設計、工具及管理
- Apache Kafka
- Eclipse 4 Plug-in Development by Example Beginner's Guide
- Julia High Performance
- PHP編程入門指南(全2冊)
- 青少年人工智能編程:光環板玩轉慧編程mBlock
- Learning F# Functional Data Structures and Algorithms
- Visual Basic程序設計教程(第3版)
- 健壯的Python