官术网_书友最值得收藏!

Big data

The term refers to large volumes of data that combine both structured data types (rows and columns similar to a table) and unstructured data types (text documents, voice recordings, image data, and so on). Due to the volume of data, it does not fit into the main memory of the hardware where ML algorithms need to be executed. Separate strategies are needed to work on these large volumes of data. Distributed processing of the data and combining the results (typically called MapReduce) is one strategy. It is also possible to process just enough data sequentially that can fit in a main memory each time and store the results somewhere on a hard drive; we need to repeat this process until the entirety of the data is processed completely. After the data processing, the results need to be combined to avail the final results of all the data that has been processed.

Special technologies such as Hadoop and Spark are required to perform ML on big data. Needless to say, you will need to hone specialized skills in order to apply ML algorithms successfully using these technologies on big data.

主站蜘蛛池模板: 新野县| 白银市| 利辛县| 旬阳县| 德江县| 襄城县| 吉林省| 柞水县| 昭平县| 金华市| 施甸县| 安溪县| 雷山县| 乌兰察布市| 新丰县| 伊春市| 连平县| 本溪市| 巩留县| 桂阳县| 澳门| 定襄县| 东乌| 邢台市| 宝应县| 于都县| 夏津县| 仪陇县| 广南县| 拉萨市| 宜川县| 铜鼓县| 尤溪县| 夏津县| 昔阳县| 湟源县| 荔浦县| 慈溪市| 定边县| 临安市| 洛隆县|