官术网_书友最值得收藏!

Batch processing

Traditionally, the data processing pipeline within data warehousing systems consisted of Extracting, Transforming, and Loading the data for analysis and actions (ETL). With the new paradigm of file-based distributed computing, there has been a shift in the ETL process sequence. Now the data is Extracted, Loaded, and Transformed repetitively for analysis (ELTTT) a number of times:

In batch processing, the data is collected from various sources in the staging areas and loaded and transformed with defined frequencies and schedules. In most use cases with batch processing, there is no critical need to process the data in real time or in near real time. As an example, the monthly report on a student's attendance data will be generated by a process (batch) at the end of a calendar month. This process will extract the data from source systems, load it, and transform it for various views and reports. One of the most popular batch processing frameworks is Apache Hadoop. It is a highly scalable, distributed/parallel processing framework. The primary building block of Hadoop is the Hadoop Distributed File System.

As the name suggests, this is a wrapper filesystem which stores the data (structured/unstructured/semi-structured) in a distributed manner on data nodes within Hadoop. The processing that is applied on the data (instead of the data that is processed) is sent to the data on various nodes. Once the compute is performed by an inpidual node, the results are consolidated by the master process. In this paradigm of data-compute localization, Hadoop relies heavily on intermediate I/O operations on hard drive disks. As a result, extremely large volumes of data can be processed by Hadoop in a reliable manner at the cost of processing time. This framework is very suitable for extracting value from Big Data in batch mode.

主站蜘蛛池模板: 孟州市| 汝南县| 台南县| 兴安县| 水城县| 吴江市| 石阡县| 吉林省| 周宁县| 东丰县| 阜阳市| 襄垣县| 剑河县| 关岭| 临邑县| 珲春市| 普兰店市| 文化| 云龙县| 仪征市| 阜平县| 高州市| 吐鲁番市| 论坛| 石屏县| 通城县| 永仁县| 岱山县| 广河县| 长兴县| 喀喇沁旗| 丰镇市| 凌源市| 金塔县| 红安县| 宿迁市| 大关县| 汾西县| 壤塘县| 东辽县| 甘洛县|