官术网_书友最值得收藏!

Chapter 2. Managing and Understanding Data

A key early component of any machine learning project involves managing and understanding the data you have collected. Although you may not find it as gratifying as building and deploying models—the stages in which you begin to see the fruits of your labor—you cannot ignore the preparatory work.

Any learning algorithm is only as good as its input data, and in many cases, input data is complex, messy, and spread across multiple sources and formats. Because of this complexity, the largest portion of effort invested in machine learning projects is spent on the data preparation and exploration process.

This chapter is divided into three main sections. The first section discusses the basic data structures R uses to store data. You will become very familiar with these structures as you create and manipulate datasets. The second section is practical, as it covers several functions that are useful for getting data in and out of R. In the third section, methods for understanding data are illustrated throughout the process of exploring a real-world dataset.

By the end of this chapter, you will understand:

  • The basic R data structures and how to use them to store and extract data
  • How to get data into R from a variety of source formats
  • Common methods for understanding and visualizing complex data

Since the way R thinks about data will define the way you think about data, it is helpful to understand the basic R data structures before jumping into data preparation. However, if you are already familiar with R data structures, feel free to skip ahead to the section on data preprocessing.

主站蜘蛛池模板: 阆中市| 邵阳县| 普定县| 鹿邑县| 碌曲县| 塘沽区| 贺兰县| 乐清市| 陆河县| 镇巴县| 工布江达县| 大安市| 永城市| 长沙市| 三门县| 濉溪县| 嘉鱼县| 遂平县| 南宁市| 邵阳县| 青浦区| 稷山县| 永和县| 衡水市| 商都县| 毕节市| 怀宁县| 萨迦县| 资溪县| 鲁甸县| 衢州市| 民和| 古蔺县| 凉城县| 南华县| 齐齐哈尔市| 定州市| 钟山县| 宁夏| 新民市| 旅游|