官术网_书友最值得收藏!

Data Cleaning and Advanced Machine Learning

The goal of data analytics, in general, is to uncover actionable insights that result in positive business outcomes. In the case of predictive analytics, the aim is to do this by determining the most likely future outcome of a target, based on previous trends and patterns.

The benefits of predictive analytics are not restricted to big technology companies. Any business can find ways to benefit from machine learning, given the right data.

Companies all around the world are collecting massive amounts of data and using predictive analytics to cut costs and increase profits. Some of the most prevalent examples of this are from the technology giants Google, Facebook, and Amazon, who utilize big data on a huge scale. For example, Google and Facebook serve you personalized ads based on predictive algorithms that guess what you are most likely to click on. Similarly, Amazon recommends personalized products that you are most likely to buy, given your previous purchases.

Modern predictive analytics is done with machine learning, where computer models are trained to learn patterns from data. As we saw briefly in the previous chapter, software such as scikit-learn can be used with Jupyter Notebooks to efficiently build and test machine learning models. As we will continue to see, Jupyter Notebooks are an ideal environment for doing this type of work, as we can perform ad-hoc testing and analysis, and easily save the results for reference later.

In this chapter, we will again take a hands-on approach by running through various examples and activities in a Jupyter Notebook. Where we saw a couple of examples of machine learning in the previous chapter, here we'll take a much slower and more thoughtful approach. Using an employee retention problem as our overarching example for the chapter, we will discuss how to approach predictive analytics, what things to consider when preparing the data for modeling, and how to implement and compare a variety of models using Jupyter Notebooks.

By the end of this chapter, you will be able to:

  • Plan a machine learning classification strategy
  • Preprocess data to prepare it for machine learning
  • Train classification models
  • Use validation curves to tune model parameters
  • Use dimensionality reduction to enhance model performance
主站蜘蛛池模板: 五华县| 临沭县| 墨江| 桃江县| 昌都县| 邻水| 来凤县| 濮阳县| 华安县| 林西县| 万宁市| 沭阳县| 邢台县| 阿坝县| 桐柏县| 新巴尔虎右旗| 都江堰市| 济宁市| 江都市| 化州市| 侯马市| 嫩江县| 山东省| 梁平县| 泾阳县| 柏乡县| 新巴尔虎左旗| 南阳市| 河南省| 闽清县| 江油市| 安远县| 蕲春县| 翁源县| 东平县| 临猗县| 尼玛县| 高密市| 大兴区| 云龙县| 鄱阳县|