舉報

會員
Practical Big Data Analytics
ThebookisintendedforexistingandaspiringBigDataprofessionalswhowishtobecomethego-topersonintheirorganizationwhenitcomestoBigDataarchitecture,analytics,andgovernance.WhilenopriorknowledgeofBigDataorrelatedtechnologiesisassumed,itwillbehelpfultohavesomeprogrammingexperience.
目錄(261章)
倒序
- 封面
- 版權信息
- Packt Upsell
- Why subscribe?
- PacktPub.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Too Big or Not Too Big
- What is big data?
- A brief history of data
- Dawn of the information age
- Dr. Alan Turing and modern computing
- The advent of the stored-program computer
- From magnetic devices to SSDs
- Why we are talking about big data now if data has always existed
- Definition of big data
- Building blocks of big data analytics
- Types of Big Data
- Structured
- Unstructured
- Semi-structured
- Sources of big data
- The 4Vs of big data
- When do you know you have a big data problem and where do you start your search for the big data solution?
- Summary
- Big Data Mining for the Masses
- What is big data mining?
- Big data mining in the enterprise
- Building the case for a Big Data strategy
- Implementation life cycle
- Stakeholders of the solution
- Implementing the solution
- Technical elements of the big data platform
- Selection of the hardware stack
- Selection of the software stack
- Summary
- The Analytics Toolkit
- Components of the Analytics Toolkit
- System recommendations
- Installing on a laptop or workstation
- Installing on the cloud
- Installing Hadoop
- Installing Oracle VirtualBox
- Installing CDH in other environments
- Installing Packt Data Science Box
- Installing Spark
- Installing R
- Steps for downloading and installing Microsoft R Open
- Installing RStudio
- Installing Python
- Summary
- Big Data With Hadoop
- The fundamentals of Hadoop
- The fundamental premise of Hadoop
- The core modules of Hadoop
- Hadoop Distributed File System - HDFS
- Data storage process in HDFS
- Hadoop MapReduce
- An intuitive introduction to MapReduce
- A technical understanding of MapReduce
- Block size and number of mappers and reducers
- Hadoop YARN
- Job scheduling in YARN
- Other topics in Hadoop
- Encryption
- User authentication
- Hadoop data storage formats
- New features expected in Hadoop 3
- The Hadoop ecosystem
- Hands-on with CDH
- WordCount using Hadoop MapReduce
- Analyzing oil import prices with Hive
- Joining tables in Hive
- Summary
- Big Data Mining with NoSQL
- Why NoSQL?
- The ACID BASE and CAP properties
- ACID and SQL
- The BASE property of NoSQL
- The CAP theorem
- The need for NoSQL technologies
- Google Bigtable
- Amazon Dynamo
- NoSQL databases
- In-memory databases
- Columnar databases
- Document-oriented databases
- Key-value databases
- Graph databases
- Other NoSQL types and summary of other types of databases
- Analyzing Nobel Laureates data with MongoDB
- JSON format
- Installing and using MongoDB
- Tracking physician payments with real-world data
- Installing kdb+ R and RStudio
- Installing kdb+
- Installing R
- Installing RStudio
- The CMS Open Payments Portal
- Downloading the CMS Open Payments data
- Creating the Q application
- Loading the data
- The backend code
- Creating the frontend web portal
- R Shiny platform for developers
- Putting it all together - The CMS Open Payments application
- Applications
- Summary
- Spark for Big Data Analytics
- The advent of Spark
- Limitations of Hadoop
- Overcoming the limitations of Hadoop
- Theoretical concepts in Spark
- Resilient distributed datasets
- Directed acyclic graphs
- SparkContext
- Spark DataFrames
- Actions and transformations
- Spark deployment options
- Spark APIs
- Core components in Spark
- Spark Core
- Spark SQL
- Spark Streaming
- GraphX
- MLlib
- The architecture of Spark
- Spark solutions
- Spark practicals
- Signing up for Databricks Community Edition
- Spark exercise - hands-on with Spark (Databricks)
- Summary
- An Introduction to Machine Learning Concepts
- What is machine learning?
- The evolution of machine learning
- Factors that led to the success of machine learning
- Machine learning statistics and AI
- Categories of machine learning
- Supervised and unsupervised machine learning
- Supervised machine learning
- Vehicle Mileage Number Recognition and other examples
- Unsupervised machine learning
- Subdividing supervised machine learning
- Common terminologies in machine learning
- The core concepts in machine learning
- Data management steps in machine learning
- Pre-processing and feature selection techniques
- Centering and scaling
- The near-zero variance function
- Removing correlated variables
- Other common data transformations
- Data sampling
- Data imputation
- The importance of variables
- The train test splits and cross-validation concepts
- Splitting the data into train and test sets
- The cross-validation parameter
- Creating the model
- Leveraging multicore processing in the model
- Summary
- Machine Learning Deep Dive
- The bias variance and regularization properties
- The gradient descent and VC Dimension theories
- Popular machine learning algorithms
- Regression models
- Association rules
- Confidence
- Support
- Lift
- Decision trees
- The Random forest extension
- Boosting algorithms
- Support vector machines
- The K-Means machine learning technique
- The neural networks related algorithms
- Tutorial - associative rules mining with CMS data
- Downloading the data
- Writing the R code for Apriori
- Shiny (R Code)
- Using custom CSS and fonts for the application
- Running the application
- Summary
- Enterprise Data Science
- Enterprise data science overview
- A roadmap to enterprise analytics success
- Data science solutions in the enterprise
- Enterprise data warehouse and data mining
- Traditional data warehouse systems
- Oracle Exadata Exalytics and TimesTen
- HP Vertica
- Teradata
- IBM data warehouse systems (formerly Netezza appliances)
- PostgreSQL
- Greenplum
- SAP Hana
- Enterprise and open source NoSQL Databases
- Kdb+
- MongoDB
- Cassandra
- Neo4j
- Cloud databases
- Amazon Redshift Redshift Spectrum and Athena databases
- Google BigQuery and other cloud services
- Azure CosmosDB
- GPU databases
- Brytlyt
- MapD
- Other common databases
- Enterprise data science – machine learning and AI
- The R programming language
- Python
- OpenCV Caffe and others
- Spark
- Deep learning
- H2O and Driverless AI
- Datarobot
- Command-line tools
- Apache MADlib
- Machine learning as a service
- Enterprise infrastructure solutions
- Cloud computing
- Virtualization
- Containers – Docker Kubernetes and Mesos
- On-premises hardware
- Enterprise Big Data
- Tutorial – using RStudio in the cloud
- Summary
- Closing Thoughts on Big Data
- Corporate big data and data science strategy
- Ethical considerations
- Silicon Valley and data science
- The human factor
- Characteristics of successful projects
- Summary
- External Data Science Resources
- Big data resources
- NoSQL products
- Languages and tools
- Creating dashboards
- Notebooks
- Visualization libraries
- Courses on R
- Courses on machine learning
- Machine learning and deep learning links
- Web-based machine learning services
- Movies
- Machine learning books from Packt
- Books for leisure reading
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-07-02 19:27:06
推薦閱讀
- 大學計算機基礎:基礎理論篇
- AutoCAD繪圖實用速查通典
- 錯覺:AI 如何通過數據挖掘誤導我們
- 嵌入式Linux上的C語言編程實踐
- Python Data Science Essentials
- 分布式多媒體計算機系統
- Hadoop Real-World Solutions Cookbook(Second Edition)
- 大學計算機應用基礎
- 數據挖掘方法及天體光譜挖掘技術
- 新手學電腦快速入門
- 中國戰略性新興產業研究與發展:智能制造
- Windows游戲程序設計基礎
- 邊緣智能:關鍵技術與落地實踐
- 網絡服務搭建、配置與管理大全(Linux版)
- 所羅門的密碼
- 電氣控制與PLC原理及應用(歐姆龍機型)
- Effective Business Intelligence with QuickSight
- 新一代人工智能與語音識別
- Linux常用命令簡明手冊
- 大話數據科學:大數據與機器學習實戰(基于R語言)
- PyTorch 1.x Reinforcement Learning Cookbook
- 圖解傳感器與儀表應用(第2版)
- Pentaho for Big Data Analytics
- Hands-On Microservices with Kubernetes
- Troubleshooting System Center Configuration Manager
- Modern Computer Architecture and Organization
- Web應用項目開發
- 機器學習
- Arduino創意機器人入門
- 物聯網安全技術