官术网_书友最值得收藏!

What this book covers

Chapter 1, Scala Overview, gives you a quick run through Scala and its features. It will prepare you for upcoming chapters.

Chapter 2, Data Analysis Life Cycle, turns the focus exclusively to data analysis and its typical life cycle. It provides an overview of the steps involved in the data analysis life cycle.

Chapter 3, Data Ingestion, deep-dives into the data ingestion aspects of the data life cycle. It covers extraction, staging, validation, cleaning, and shaping data tasks. It highlights how to deal with the variety aspect of data, that is, how to handle data from different sources in different formats.

Chapter 4, Data Exploration and Visualization, deep-dives into the data exploration and visualization parts of the life cycle. It familiarizes the reader with techniques for discovering inherent properties associated with data using statistical as well as visual methods.

Chapter 5, Applying Statistics and Hypothesis Testing, provides an overview of the statistical methods used in data analysis and covers techniques for deriving meaningful insights from data.

Chapter 6, Intro to Spark for Distributed Data Analysis, covers the transition to doing data analysis on distributed systems and doing it at scale. It provides a good introduction to Spark, a Scala-based distributed framework for data processing. It will guide you through Spark setup on your computer and introduce key features using practical examples.

Chapter 7, Traditional Machine Learning for Data Analysis, covers topics such as decision trees, random forests, lasso regression, and k-means cluster analysis. It also covers the role of NLP in effectively analyzing certain types of data.

Chapter 8, Near Real-Time Data Analysis Using Streaming, introduces the concept of stream-oriented processing and compares it to traditional batch-oriented processing. It also illustrates how streaming can be used to perform near real-time data analysis. This chapter deep-dives into Spark Streaming and will guide you on implementing clustering and a classifier leveraging Spark Streaming APIs.

Chapter 9, Working with Data at Scale, is dedicated to processing data at scale. It looks at data analysis from multiple dimensions, such as cost, reliability, and performance. It provides guidance on some of the best reliability and performance practices. It provides a complete picture of how a practical real-world data analysis life cycle works and will help you to put this into practice in a production environment.

主站蜘蛛池模板: 深水埗区| 舞阳县| 紫阳县| 金堂县| 察雅县| 崇阳县| 岳普湖县| 广宁县| 罗城| 阿克陶县| 云霄县| 颍上县| 大石桥市| 会泽县| 英德市| 郎溪县| 辽源市| 儋州市| 会泽县| 松滋市| 永福县| 东兴市| 招远市| 阿鲁科尔沁旗| 桂东县| 普兰店市| 偏关县| 彭州市| 海南省| 米泉市| 墨竹工卡县| 沂水县| 洛南县| 澎湖县| 晋江市| 菏泽市| 金溪县| 曲靖市| 台安县| 湘西| 叙永县|