舉報

會員
Mastering Machine Learning with Spark 2.x
最新章節:
Summary
Areyouadeveloperwithabackgroundinmachinelearningandstatisticswhoisfeelinglimitedbythecurrentslowand“smalldata”machinelearningtools?Thenthisisthebookforyou!Inthisbook,youwillcreatescalablemachinelearningapplicationstopoweramoderndata-drivenbusinessusingSpark.WeassumethatyoualreadyknowthemachinelearningconceptsandalgorithmsandhaveSparkupandrunning(whetheronaclusterorlocally)andhaveabasicknowledgeofthevariouslibrariescontainedinSpark.
目錄(183章)
倒序
- cover
- Title Page
- Copyright
- Mastering Machine Learning with Spark 2.x
- Credits
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Introduction to Large-Scale Machine Learning and Spark
- Data science
- The sexiest role of the 21st century – data scientist?
- A day in the life of a data scientist
- Working with big data
- The machine learning algorithm using a distributed environment
- Splitting of data into multiple machines
- From Hadoop MapReduce to Spark
- What is Databricks?
- Inside the box
- Introducing H2O.ai
- Design of Sparkling Water
- What's the difference between H2O and Spark's MLlib?
- Data munging
- Data science - an iterative process
- Summary
- Detecting Dark Matter - The Higgs-Boson Particle
- Type I versus type II error
- Finding the Higgs-Boson particle
- The LHC and data creation
- The theory behind the Higgs-Boson
- Measuring for the Higgs-Boson
- The dataset
- Spark start and data load
- Labeled point vector
- Data caching
- Creating a training and testing set
- What about cross-validation?
- Our first model – decision tree
- Gini versus Entropy
- Next model – tree ensembles
- Random forest model
- Grid search
- Gradient boosting machine
- Last model - H2O deep learning
- Build a 3-layer DNN
- Adding more layers
- Building models and inspecting results
- Summary
- Ensemble Methods for Multi-Class Classification
- Data
- Modeling goal
- Challenges
- Machine learning workflow
- Starting Spark shell
- Exploring data
- Missing data
- Summary of missing value analysis
- Data unification
- Missing values
- Categorical values
- Final transformation
- Modelling data with Random Forest
- Building a classification model using Spark RandomForest
- Classification model evaluation
- Spark model metrics
- Building a classification model using H2O RandomForest
- Summary
- Predicting Movie Reviews Using NLP and Spark Streaming
- NLP - a brief primer
- The dataset
- Dataset preparation
- Feature extraction
- Feature extraction method– bag-of-words model
- Text tokenization
- Declaring our stopwords list
- Stemming and lemmatization
- Featurization - feature hashing
- Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme
- Let's do some (model) training!
- Spark decision tree model
- Spark Naive Bayes model
- Spark random forest model
- Spark GBM model
- Super-learner model
- Super learner
- Composing all transformations together
- Using the super-learner model
- Summary
- Word2vec for Prediction and Clustering
- Motivation of word vectors
- Word2vec explained
- What is a word vector?
- The CBOW model
- The skip-gram model
- Fun with word vectors
- Cosine similarity
- Doc2vec explained
- The distributed-memory model
- The distributed bag-of-words model
- Applying word2vec and exploring our data with vectors
- Creating document vectors
- Supervised learning task
- Summary
- Extracting Patterns from Clickstream Data
- Frequent pattern mining
- Pattern mining terminology
- Frequent pattern mining problem
- The association rule mining problem
- The sequential pattern mining problem
- Pattern mining with Spark MLlib
- Frequent pattern mining with FP-growth
- Association rule mining
- Sequential pattern mining with prefix span
- Pattern mining on MSNBC clickstream data
- Deploying a pattern mining application
- The Spark Streaming module
- Summary
- Graph Analytics with GraphX
- Basic graph theory
- Graphs
- Directed and undirected graphs
- Order and degree
- Directed acyclic graphs
- Connected components
- Trees
- Multigraphs
- Property graphs
- GraphX distributed graph processing engine
- Graph representation in GraphX
- Graph properties and operations
- Building and loading graphs
- Visualizing graphs with Gephi
- Gephi
- Creating GEXF files from GraphX graphs
- Advanced graph processing
- Aggregating messages
- Pregel
- GraphFrames
- Graph algorithms and applications
- Clustering
- Vertex importance
- GraphX in context
- Summary
- Lending Club Loan Prediction
- Motivation
- Goal
- Data
- Data dictionary
- Preparation of the environment
- Data load
- Exploration – data analysis
- Basic clean up
- Useless columns
- String columns
- Loan progress columns
- Categorical columns
- Text columns
- Missing data
- Prediction targets
- Loan status model
- Base model
- The emp_title column transformation
- The desc column transformation
- Interest RateModel
- Using models for scoring
- Model deployment
- Stream creation
- Stream transformation
- Stream output
- Summary 更新時間:2021-07-02 18:46:37
推薦閱讀
- Practical Data Analysis Cookbook
- Java面向對象軟件開發
- Practical Data Science Cookbook(Second Edition)
- Selenium Design Patterns and Best Practices
- PHP+MySQL+Dreamweaver動態網站開發實例教程
- Python商務數據分析(微課版)
- 交互式程序設計(第2版)
- Learning Kotlin by building Android Applications
- Learning C++ by Creating Games with UE4
- Elasticsearch Blueprints
- Mastering ArcGIS Server Development with JavaScript
- Flutter之旅
- PhantomJS Cookbook
- Learning Dynamics NAV Patterns
- PHP典型模塊與項目實戰大全
- Swift iOS Programming for Kids
- HTML5+CSS3+JavaScript案例實戰
- Nginx Troubleshooting
- 深入理解OSGi:Equinox原理、應用與最佳實踐
- Python從入門到全棧開發
- Robotic Process Automation Projects
- AWS Administration:The Definitive Guide
- Dapr與.NET微服務實戰
- 速學Django:Web開發從入門到進階
- Java程序員面試筆試通關寶典
- Python網絡編程(原書第2版)
- 零基礎學ASP.NET 3.5
- 新印象:中文版Sketch圖標與UI界面設計實例教程
- 小猴編程:Scratch 3.0趣味少兒編程(入門篇)
- 區塊鏈應用指南:方法與實踐