書名： R Machine Learning Projects
作者名： Dr. Sunil Kumar Chinnamgari
本章字數： 193字
更新時間： 2021-07-02 14:23:06

Big data

The term refers to large volumes of data that combine both structured data types (rows and columns similar to a table) and unstructured data types (text documents, voice recordings, image data, and so on). Due to the volume of data, it does not fit into the main memory of the hardware where ML algorithms need to be executed. Separate strategies are needed to work on these large volumes of data. Distributed processing of the data and combining the results (typically called MapReduce) is one strategy. It is also possible to process just enough data sequentially that can fit in a main memory each time and store the results somewhere on a hard drive; we need to repeat this process until the entirety of the data is processed completely. After the data processing, the results need to be combined to avail the final results of all the data that has been processed.

Special technologies such as Hadoop and Spark are required to perform ML on big data. Needless to say, you will need to hone specialized skills in order to apply ML algorithms successfully using these technologies on big data.

官术网_书友最值得收藏!

R Machine Learning Projects

Big data