官术网_书友最值得收藏!

Summary

This chapter introduced some of the key tools used for data science. In particular, it demonstrated how to download and install the virtual machine for the Cloudera Distribution of Hadoop (CDH), Spark, R, RStudio, and Python. Although the user can download the source code of Hadoop and install it on, say, a Unix system, it is usually fraught with issues and requires a fair amount of debugging. Using a VM instead allows the user to begin using and learning Hadoop with minimal effort as it is a complete preconfigured environment.

Additionally, R and Python are the two most commonly used languages for machine learning and in general, analytics. They are available for all popular operating systems. Although they can be installed in the VM, the user is encouraged to try and install them on their local machines (laptop/workstation) if feasible as it will have relatively higher performance.

In the next chapter, we will pe deeper into the details of Hadoop and its core components and concepts.

主站蜘蛛池模板: 临邑县| 富源县| 昭苏县| 项城市| 马关县| 西林县| 辉县市| 将乐县| 托里县| 永修县| 宝兴县| 湖北省| 保靖县| 云林县| 达日县| 凯里市| 泗洪县| 铁岭市| 宕昌县| 潜山县| 望城县| 洱源县| 灵山县| 通江县| 吕梁市| 呼玛县| 河南省| 西乌珠穆沁旗| 托克托县| 英山县| 依兰县| 本溪| 金湖县| 安多县| 西乌珠穆沁旗| 永善县| 桃园县| 蚌埠市| 岑溪市| 咸阳市| 镇安县|