官术网_书友最值得收藏!

Summary

This chapter introduced some of the key tools used for data science. In particular, it demonstrated how to download and install the virtual machine for the Cloudera Distribution of Hadoop (CDH), Spark, R, RStudio, and Python. Although the user can download the source code of Hadoop and install it on, say, a Unix system, it is usually fraught with issues and requires a fair amount of debugging. Using a VM instead allows the user to begin using and learning Hadoop with minimal effort as it is a complete preconfigured environment.

Additionally, R and Python are the two most commonly used languages for machine learning and in general, analytics. They are available for all popular operating systems. Although they can be installed in the VM, the user is encouraged to try and install them on their local machines (laptop/workstation) if feasible as it will have relatively higher performance.

In the next chapter, we will pe deeper into the details of Hadoop and its core components and concepts.

主站蜘蛛池模板: 桐庐县| 孝感市| 万山特区| 大悟县| 额济纳旗| 滦平县| 台北市| 永寿县| 那坡县| 民县| 鲜城| 孟津县| 彩票| 湖州市| 靖边县| 文成县| 万宁市| 多伦县| 抚顺市| 信宜市| 贵州省| 进贤县| 台东市| 四子王旗| 遂昌县| 隆回县| 习水县| 和田县| 时尚| 静海县| 灵璧县| 句容市| 霍山县| 休宁县| 和政县| 固原市| 永宁县| 宕昌县| 红原县| 五华县| 洞头县|