官术网_书友最值得收藏!

Summary

In the first chapter we explained the ambiguity of Big Data definitions and highlighted its major features. We also talked about a deluge of Big Data sources, and mentioned that even one event, such as Messi's goal, can lead to an avalanche of large amounts of data being created almost instantaneously.

You were then introduced to some most commonly used Big Data tools we will be working with later, such as Hadoop, its Distributed File System and the parallel MapReduce framework, traditional SQL and NoSQL databases, and the Apache Spark project, which allows faster (and in many cases easier) data processing than in Hadoop.

We ended the chapter by presenting the origins of the R programming language, its gradual evolution into the most widely-used statistical computing environment, and the current position of R amongst a spectrum of Big Data analytics tools.

In the next chapter you will finally have a chance to get your hands dirty and learn, or revise, a number of frequently used functions in R for data management, transformations, and analysis.

主站蜘蛛池模板: 江北区| 西城区| 井研县| 大名县| 安福县| 平谷区| 大竹县| 克东县| 苍溪县| 惠来县| 阳江市| 那曲县| 陇南市| 华亭县| 博野县| 乌苏市| 邢台县| 呈贡县| 汨罗市| 交口县| 昌平区| 荣成市| 大埔区| 闽清县| 禄丰县| 南部县| 东乡族自治县| 古交市| 沭阳县| 澄迈县| 崇信县| 万荣县| 苏尼特右旗| 宁南县| 石棉县| 龙山县| 子洲县| 苏尼特左旗| 武汉市| 宁化县| 正定县|