舉報

會員
Hands-On Data Analysis with Scala
Efficientbusinessdecisionswithanaccuratesenseofbusinessdatahelpsindeliveringbetterperformanceacrossproductsandservices.ThisbookhelpsyoutoleveragethepopularScalalibrariesandtoolsforperformingcoredataanalysistaskswithease.Thebookbeginswithaquickoverviewofthebuildingblocksofastandarddataanalysisprocess.YouwilllearntoperformbasictaskslikeExtraction,Staging,Validation,Cleaning,andShapingofdatasets.Youwilllaterdeepdiveintothedataexplorationandvisualizationareasofthedataanalysislifecycle.YouwillmakeuseofpopularScalalibrarieslikeSaddle,Breeze,Vegas,andPredictionIOforprocessingyourdatasets.Youwilllearnstatisticalmethodsforderivingmeaningfulinsightsfromdata.YouwillalsolearntocreateapplicationsforApacheSpark2.xoncomplexdataanalysis,inreal-time.Youwilldiscovertraditionalmachinelearningtechniquesfordoingdataanalysis.Furthermore,youwillalsobeintroducedtoneuralnetworksanddeeplearningfromadataanalysisstandpoint.Bytheendofthisbook,youwillbecapableofhandlinglargesetsofstructuredandunstructureddata,performexploratoryanalysis,andbuildingefficientScalaapplicationsfordiscoveringanddeliveringinsights
目錄(158章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Hands-On Data Analysis with Scala
- Dedication
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Section 1: Scala and Data Analysis Life Cycle
- Scala Overview
- Getting started with Scala
- Running Scala code online
- Scastie
- ScalaFiddle
- Installing Scala on your computer
- Installing command-line tools
- Installing IDE
- Overview of object-oriented and functional programming
- Object-oriented programming using Scala
- Functional programming using Scala
- Scala case classes and the collection API
- Scala case classes
- Scala collection API
- Array
- List
- Map
- Overview of Scala libraries for data analysis
- Apache Spark
- Breeze
- Breeze-viz
- DeepLearning
- Epic
- Saddle
- Scalalab
- Smile
- Vegas
- Summary
- Data Analysis Life Cycle
- Data journey
- Sourcing data
- Data formats
- XML
- JSON
- CSV
- Understanding data
- Using statistical methods for data exploration
- Using Scala
- Other Scala tools
- Using data visualization for data exploration
- Using the vegas-viz library for data visualization
- Other libraries for data visualization
- Using ML to learn from data
- Setting up Smile
- Running Smile
- Creating a data pipeline
- Summary
- Data Ingestion
- Data extraction
- Pull-oriented data extraction
- Push-oriented data delivery
- Data staging
- Why is the staging important?
- Cleaning and normalizing
- Enriching
- Organizing and storing
- Summary
- Data Exploration and Visualization
- Sampling data
- Selecting the sample
- Selecting samples using Saddle
- Performing ad hoc analysis
- Finding a relationship between data elements
- Visualizing data
- Vegas viz for data visualization
- Spark Notebook for data visualization
- Downloading and installing Spark Notebook
- Creating a Spark Notebook with simple visuals
- More charts with Spark Notebook
- Box plot
- Histogram
- Bubble chart
- Summary
- Applying Statistics and Hypothesis Testing
- Basics of statistics
- Summary level statistics
- Correlation statistics
- Vector level statistics
- Random data generation
- Pseudorandom numbers
- Random numbers with normal distribution
- Random numbers with Poisson distribution
- Hypothesis testing
- Summary
- Section 2: Advanced Data Analysis and Machine Learning
- Introduction to Spark for Distributed Data Analysis
- Spark setup and overview
- Spark core concepts
- Spark Datasets and DataFrames
- Sourcing data using Spark
- Parquet file format
- Avro file format
- Spark JDBC integration
- Using Spark to explore data
- Summary
- Traditional Machine Learning for Data Analysis
- ML overview
- Characteristics of ML
- Categories or types of ML
- Decision trees
- Implementing decision trees
- Decision tree algorithms
- Implementing decision tree algorithms in our example
- Evaluating the results
- Using our model with a decision tree
- Random forest
- Random forest algorithms
- Ridge and lasso regression
- Characteristics of ridge regression
- Characteristics of lasso regression
- k-means cluster analysis
- Natural language processing for data analysis
- Algorithm selections
- Summary
- Section 3: Real-Time Data Analysis and Scalability
- Near Real-Time Data Analysis Using Streaming
- Overview of streaming
- Spark Streaming overview
- Word count using pure Scala
- Word count using Scala and Spark
- Word count using Scala and Spark Streaming
- Deep dive into the Spark Streaming solution
- Streaming a k-means clustering algorithm using Spark
- Streaming linear regression using Spark
- Summary
- Working with Data at Scale
- Working with data at scale
- Cost considerations
- Data storage
- Data governance
- Reliability considerations
- Input data errors
- Processing failures
- Summary
- Another Book You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-24 14:51:32
推薦閱讀
- 自動控制工程設(shè)計入門
- Cinema 4D R13 Cookbook
- 機器人智能運動規(guī)劃技術(shù)
- 工業(yè)機器人工程應(yīng)用虛擬仿真教程:MotoSim EG-VRC
- Security Automation with Ansible 2
- RedHat Linux用戶基礎(chǔ)
- 液壓機智能故障診斷方法集成技術(shù)
- 過程控制系統(tǒng)
- TensorFlow Deep Learning Projects
- Xilinx FPGA高級設(shè)計及應(yīng)用
- 大型機系統(tǒng)應(yīng)用基礎(chǔ)
- 機器人剛?cè)狁詈蟿恿W(xué)
- 基于元胞自動機的人群疏散系統(tǒng)建模與分析
- Spark Streaming實時流式大數(shù)據(jù)處理實戰(zhàn)
- 數(shù)據(jù)結(jié)構(gòu)與實訓(xùn)
- 時序大數(shù)據(jù)平臺TDengine核心原理與實戰(zhàn)
- 數(shù)據(jù)庫技術(shù):Access 2003·計算機網(wǎng)絡(luò)技術(shù)
- IBM Watson Projects
- 精通LabVIEW 8.x
- 過程控制與集散系統(tǒng)
- Spark MLlib機器學(xué)習(xí)實踐(第2版)
- 光固化3D打印技術(shù)
- 多媒體技術(shù)基礎(chǔ)及應(yīng)用
- 特效制作
- 新一代綠色數(shù)據(jù)中心的規(guī)劃與設(shè)計
- 雙語版Java程序設(shè)計
- 數(shù)據(jù)處理與深度學(xué)習(xí)
- Internet應(yīng)用(第4版)上機指導(dǎo)與練習(xí)
- 電子商務(wù)網(wǎng)絡(luò)技術(shù)基礎(chǔ)
- Learning ServiceNow