舉報

會員
Practical Big Data Analytics
Nataraj Dasgupta 著
更新時間:2021-07-02 19:27:06
開會員,本書免費讀 >
ThebookisintendedforexistingandaspiringBigDataprofessionalswhowishtobecomethego-topersonintheirorganizationwhenitcomestoBigDataarchitecture,analytics,andgovernance.WhilenopriorknowledgeofBigDataorrelatedtechnologiesisassumed,itwillbehelpfultohavesomeprogrammingexperience.
最新章節
- Leave a review - let other readers know what you think
- Other Books You May Enjoy
- Books for leisure reading
- Machine learning books from Packt
- Movies
- Web-based machine learning services
品牌:中圖公司
上架時間:2021-07-02 18:21:59
出版社:Packt Publishing
本書數字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發行
- Leave a review - let other readers know what you think 更新時間:2021-07-02 19:27:06
- Other Books You May Enjoy
- Books for leisure reading
- Machine learning books from Packt
- Movies
- Web-based machine learning services
- Machine learning and deep learning links
- Courses on machine learning
- Courses on R
- Visualization libraries
- Notebooks
- Creating dashboards
- Languages and tools
- NoSQL products
- Big data resources
- External Data Science Resources
- Summary
- Characteristics of successful projects
- The human factor
- Silicon Valley and data science
- Ethical considerations
- Corporate big data and data science strategy
- Closing Thoughts on Big Data
- Summary
- Tutorial – using RStudio in the cloud
- Enterprise Big Data
- On-premises hardware
- Containers – Docker Kubernetes and Mesos
- Virtualization
- Cloud computing
- Enterprise infrastructure solutions
- Machine learning as a service
- Apache MADlib
- Command-line tools
- Datarobot
- H2O and Driverless AI
- Deep learning
- Spark
- OpenCV Caffe and others
- Python
- The R programming language
- Enterprise data science – machine learning and AI
- Other common databases
- MapD
- Brytlyt
- GPU databases
- Azure CosmosDB
- Google BigQuery and other cloud services
- Amazon Redshift Redshift Spectrum and Athena databases
- Cloud databases
- Neo4j
- Cassandra
- MongoDB
- Kdb+
- Enterprise and open source NoSQL Databases
- SAP Hana
- Greenplum
- PostgreSQL
- IBM data warehouse systems (formerly Netezza appliances)
- Teradata
- HP Vertica
- Oracle Exadata Exalytics and TimesTen
- Traditional data warehouse systems
- Enterprise data warehouse and data mining
- Data science solutions in the enterprise
- A roadmap to enterprise analytics success
- Enterprise data science overview
- Enterprise Data Science
- Summary
- Running the application
- Using custom CSS and fonts for the application
- Shiny (R Code)
- Writing the R code for Apriori
- Downloading the data
- Tutorial - associative rules mining with CMS data
- The neural networks related algorithms
- The K-Means machine learning technique
- Support vector machines
- Boosting algorithms
- The Random forest extension
- Decision trees
- Lift
- Support
- Confidence
- Association rules
- Regression models
- Popular machine learning algorithms
- The gradient descent and VC Dimension theories
- The bias variance and regularization properties
- Machine Learning Deep Dive
- Summary
- Leveraging multicore processing in the model
- Creating the model
- The cross-validation parameter
- Splitting the data into train and test sets
- The train test splits and cross-validation concepts
- The importance of variables
- Data imputation
- Data sampling
- Other common data transformations
- Removing correlated variables
- The near-zero variance function
- Centering and scaling
- Pre-processing and feature selection techniques
- Data management steps in machine learning
- The core concepts in machine learning
- Common terminologies in machine learning
- Subdividing supervised machine learning
- Unsupervised machine learning
- Vehicle Mileage Number Recognition and other examples
- Supervised machine learning
- Supervised and unsupervised machine learning
- Categories of machine learning
- Machine learning statistics and AI
- Factors that led to the success of machine learning
- The evolution of machine learning
- What is machine learning?
- An Introduction to Machine Learning Concepts
- Summary
- Spark exercise - hands-on with Spark (Databricks)
- Signing up for Databricks Community Edition
- Spark practicals
- Spark solutions
- The architecture of Spark
- MLlib
- GraphX
- Spark Streaming
- Spark SQL
- Spark Core
- Core components in Spark
- Spark APIs
- Spark deployment options
- Actions and transformations
- Spark DataFrames
- SparkContext
- Directed acyclic graphs
- Resilient distributed datasets
- Theoretical concepts in Spark
- Overcoming the limitations of Hadoop
- Limitations of Hadoop
- The advent of Spark
- Spark for Big Data Analytics
- Summary
- Applications
- Putting it all together - The CMS Open Payments application
- R Shiny platform for developers
- Creating the frontend web portal
- The backend code
- Loading the data
- Creating the Q application
- Downloading the CMS Open Payments data
- The CMS Open Payments Portal
- Installing RStudio
- Installing R
- Installing kdb+
- Installing kdb+ R and RStudio
- Tracking physician payments with real-world data
- Installing and using MongoDB
- JSON format
- Analyzing Nobel Laureates data with MongoDB
- Other NoSQL types and summary of other types of databases
- Graph databases
- Key-value databases
- Document-oriented databases
- Columnar databases
- In-memory databases
- NoSQL databases
- Amazon Dynamo
- Google Bigtable
- The need for NoSQL technologies
- The CAP theorem
- The BASE property of NoSQL
- ACID and SQL
- The ACID BASE and CAP properties
- Why NoSQL?
- Big Data Mining with NoSQL
- Summary
- Joining tables in Hive
- Analyzing oil import prices with Hive
- WordCount using Hadoop MapReduce
- Hands-on with CDH
- The Hadoop ecosystem
- New features expected in Hadoop 3
- Hadoop data storage formats
- User authentication
- Encryption
- Other topics in Hadoop
- Job scheduling in YARN
- Hadoop YARN
- Block size and number of mappers and reducers
- A technical understanding of MapReduce
- An intuitive introduction to MapReduce
- Hadoop MapReduce
- Data storage process in HDFS
- Hadoop Distributed File System - HDFS
- The core modules of Hadoop
- The fundamental premise of Hadoop
- The fundamentals of Hadoop
- Big Data With Hadoop
- Summary
- Installing Python
- Installing RStudio
- Steps for downloading and installing Microsoft R Open
- Installing R
- Installing Spark
- Installing Packt Data Science Box
- Installing CDH in other environments
- Installing Oracle VirtualBox
- Installing Hadoop
- Installing on the cloud
- Installing on a laptop or workstation
- System recommendations
- Components of the Analytics Toolkit
- The Analytics Toolkit
- Summary
- Selection of the software stack
- Selection of the hardware stack
- Technical elements of the big data platform
- Implementing the solution
- Stakeholders of the solution
- Implementation life cycle
- Building the case for a Big Data strategy
- Big data mining in the enterprise
- What is big data mining?
- Big Data Mining for the Masses
- Summary
- When do you know you have a big data problem and where do you start your search for the big data solution?
- The 4Vs of big data
- Sources of big data
- Semi-structured
- Unstructured
- Structured
- Types of Big Data
- Building blocks of big data analytics
- Definition of big data
- Why we are talking about big data now if data has always existed
- From magnetic devices to SSDs
- The advent of the stored-program computer
- Dr. Alan Turing and modern computing
- Dawn of the information age
- A brief history of data
- What is big data?
- Too Big or Not Too Big
- Reviews
- Get in touch
- Conventions used
- Download the color images
- Download the example code files
- To get the most out of this book
- What this book covers
- Who this book is for
- Preface
- Packt is searching for authors like you
- About the reviewer
- About the author
- Contributors
- PacktPub.com
- Why subscribe?
- Packt Upsell
- 版權信息
- 封面
- 封面
- 版權信息
- Packt Upsell
- Why subscribe?
- PacktPub.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Too Big or Not Too Big
- What is big data?
- A brief history of data
- Dawn of the information age
- Dr. Alan Turing and modern computing
- The advent of the stored-program computer
- From magnetic devices to SSDs
- Why we are talking about big data now if data has always existed
- Definition of big data
- Building blocks of big data analytics
- Types of Big Data
- Structured
- Unstructured
- Semi-structured
- Sources of big data
- The 4Vs of big data
- When do you know you have a big data problem and where do you start your search for the big data solution?
- Summary
- Big Data Mining for the Masses
- What is big data mining?
- Big data mining in the enterprise
- Building the case for a Big Data strategy
- Implementation life cycle
- Stakeholders of the solution
- Implementing the solution
- Technical elements of the big data platform
- Selection of the hardware stack
- Selection of the software stack
- Summary
- The Analytics Toolkit
- Components of the Analytics Toolkit
- System recommendations
- Installing on a laptop or workstation
- Installing on the cloud
- Installing Hadoop
- Installing Oracle VirtualBox
- Installing CDH in other environments
- Installing Packt Data Science Box
- Installing Spark
- Installing R
- Steps for downloading and installing Microsoft R Open
- Installing RStudio
- Installing Python
- Summary
- Big Data With Hadoop
- The fundamentals of Hadoop
- The fundamental premise of Hadoop
- The core modules of Hadoop
- Hadoop Distributed File System - HDFS
- Data storage process in HDFS
- Hadoop MapReduce
- An intuitive introduction to MapReduce
- A technical understanding of MapReduce
- Block size and number of mappers and reducers
- Hadoop YARN
- Job scheduling in YARN
- Other topics in Hadoop
- Encryption
- User authentication
- Hadoop data storage formats
- New features expected in Hadoop 3
- The Hadoop ecosystem
- Hands-on with CDH
- WordCount using Hadoop MapReduce
- Analyzing oil import prices with Hive
- Joining tables in Hive
- Summary
- Big Data Mining with NoSQL
- Why NoSQL?
- The ACID BASE and CAP properties
- ACID and SQL
- The BASE property of NoSQL
- The CAP theorem
- The need for NoSQL technologies
- Google Bigtable
- Amazon Dynamo
- NoSQL databases
- In-memory databases
- Columnar databases
- Document-oriented databases
- Key-value databases
- Graph databases
- Other NoSQL types and summary of other types of databases
- Analyzing Nobel Laureates data with MongoDB
- JSON format
- Installing and using MongoDB
- Tracking physician payments with real-world data
- Installing kdb+ R and RStudio
- Installing kdb+
- Installing R
- Installing RStudio
- The CMS Open Payments Portal
- Downloading the CMS Open Payments data
- Creating the Q application
- Loading the data
- The backend code
- Creating the frontend web portal
- R Shiny platform for developers
- Putting it all together - The CMS Open Payments application
- Applications
- Summary
- Spark for Big Data Analytics
- The advent of Spark
- Limitations of Hadoop
- Overcoming the limitations of Hadoop
- Theoretical concepts in Spark
- Resilient distributed datasets
- Directed acyclic graphs
- SparkContext
- Spark DataFrames
- Actions and transformations
- Spark deployment options
- Spark APIs
- Core components in Spark
- Spark Core
- Spark SQL
- Spark Streaming
- GraphX
- MLlib
- The architecture of Spark
- Spark solutions
- Spark practicals
- Signing up for Databricks Community Edition
- Spark exercise - hands-on with Spark (Databricks)
- Summary
- An Introduction to Machine Learning Concepts
- What is machine learning?
- The evolution of machine learning
- Factors that led to the success of machine learning
- Machine learning statistics and AI
- Categories of machine learning
- Supervised and unsupervised machine learning
- Supervised machine learning
- Vehicle Mileage Number Recognition and other examples
- Unsupervised machine learning
- Subdividing supervised machine learning
- Common terminologies in machine learning
- The core concepts in machine learning
- Data management steps in machine learning
- Pre-processing and feature selection techniques
- Centering and scaling
- The near-zero variance function
- Removing correlated variables
- Other common data transformations
- Data sampling
- Data imputation
- The importance of variables
- The train test splits and cross-validation concepts
- Splitting the data into train and test sets
- The cross-validation parameter
- Creating the model
- Leveraging multicore processing in the model
- Summary
- Machine Learning Deep Dive
- The bias variance and regularization properties
- The gradient descent and VC Dimension theories
- Popular machine learning algorithms
- Regression models
- Association rules
- Confidence
- Support
- Lift
- Decision trees
- The Random forest extension
- Boosting algorithms
- Support vector machines
- The K-Means machine learning technique
- The neural networks related algorithms
- Tutorial - associative rules mining with CMS data
- Downloading the data
- Writing the R code for Apriori
- Shiny (R Code)
- Using custom CSS and fonts for the application
- Running the application
- Summary
- Enterprise Data Science
- Enterprise data science overview
- A roadmap to enterprise analytics success
- Data science solutions in the enterprise
- Enterprise data warehouse and data mining
- Traditional data warehouse systems
- Oracle Exadata Exalytics and TimesTen
- HP Vertica
- Teradata
- IBM data warehouse systems (formerly Netezza appliances)
- PostgreSQL
- Greenplum
- SAP Hana
- Enterprise and open source NoSQL Databases
- Kdb+
- MongoDB
- Cassandra
- Neo4j
- Cloud databases
- Amazon Redshift Redshift Spectrum and Athena databases
- Google BigQuery and other cloud services
- Azure CosmosDB
- GPU databases
- Brytlyt
- MapD
- Other common databases
- Enterprise data science – machine learning and AI
- The R programming language
- Python
- OpenCV Caffe and others
- Spark
- Deep learning
- H2O and Driverless AI
- Datarobot
- Command-line tools
- Apache MADlib
- Machine learning as a service
- Enterprise infrastructure solutions
- Cloud computing
- Virtualization
- Containers – Docker Kubernetes and Mesos
- On-premises hardware
- Enterprise Big Data
- Tutorial – using RStudio in the cloud
- Summary
- Closing Thoughts on Big Data
- Corporate big data and data science strategy
- Ethical considerations
- Silicon Valley and data science
- The human factor
- Characteristics of successful projects
- Summary
- External Data Science Resources
- Big data resources
- NoSQL products
- Languages and tools
- Creating dashboards
- Notebooks
- Visualization libraries
- Courses on R
- Courses on machine learning
- Machine learning and deep learning links
- Web-based machine learning services
- Movies
- Machine learning books from Packt
- Books for leisure reading
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-07-02 19:27:06