官术网_书友最值得收藏!

Summary

In this chapter, we explained why ML is a crucial tool in a data scientist's repository. We discussed what a structured ML dataset looks like and how to identify the types of features in the dataset. 

We took a deep dive into the Naive Bayes classification algorithm, and studied how Bayes' theorem is used in the Naive Bayes algorithm. We learned that, using Bayes' theorem, we can predict the probability of an event occurring based on the values of each feature, and select the event that has the highest probability.

We also presented an example of a Twitter dataset. We hope that you learned how to think about a text classification problem, and how to build a Naive Bayes classification model to predict the source of a tweet. We also presented how the algorithm can be implemented in SageMaker, and how it can also be implemented using Apache Spark. This code base should help you tackle any text classification problems in the future. As the implementation is presented using SageMaker services and Spark, it can scale to datasets that can be gigabytes or terabytes in size.

We will look at how to deploy the ML models on actual production clusters in later chapters. 

主站蜘蛛池模板: 临夏县| 即墨市| 苏尼特左旗| 富顺县| 通化县| 绵竹市| 铜陵市| 淄博市| 赫章县| 微山县| 长沙县| 和静县| 景德镇市| 延吉市| 崇明县| 陇西县| 华阴市| 仙居县| 陆河县| 林甸县| 常宁市| 陆河县| 万源市| 泰来县| 西和县| 永德县| 冀州市| 晋江市| 浦江县| 昌宁县| 张家口市| 南丹县| 凉城县| 五大连池市| 财经| 汉源县| 杭锦后旗| 迁安市| 上虞市| 普格县| 罗源县|