官术网_书友最值得收藏!

Chapter 1.  The Big Data Science Ecosystem

As a data scientist, you'll no doubt be very familiar with handling files and processing perhaps even large amounts of data. However, as I'm sure you will agree, doing anything more than a simple analysis over a single type of data requires a method of organizing and cataloguing data so that it can be managed effectively. Indeed, this is the cornerstone of a great data scientist. As the data volume and complexity increases, a consistent and robust approach can be the difference between generalized success and over-fitted failure!

This chapter is an introduction to an approach and ecosystem for achieving success with data at scale. It focuses on the data science tools and technologies. It introduces the environment, and how to configure it appropriately, but also explains some of the nonfunctional considerations relevant to the overall data architecture. While there is little actual data science at this stage, it provides the essential platform to pave the way for success in the rest of the book.

In this chapter, we will cover the following topics:

  • Data management responsibilities
  • Data architecture
  • Companion tools
主站蜘蛛池模板: 北京市| 克拉玛依市| 凭祥市| 明水县| 临高县| 元谋县| 昂仁县| 玛沁县| 平遥县| 呼图壁县| 商都县| 古蔺县| 连平县| 盐山县| 山东省| 望江县| 奉节县| 定兴县| 卢湾区| 邳州市| 邢台县| 宾川县| 江门市| 玉林市| 安乡县| 安龙县| 黑龙江省| 台安县| 杭锦后旗| 左贡县| 黄梅县| 昌平区| 惠州市| 定西市| 永嘉县| 丰顺县| 宝兴县| 泰兴市| 长汀县| 浮梁县| 井研县|