- Machine Learning for Developers
- Rodolfo Bonnin
- 287字
- 2021-07-02 15:46:51
The ETL process
The previous stages in the big data processing field evolved over several decades under the name of data mining, and then adopted the popular name of big data.
One of the best outcomes of these disciplines is the specification of the Extraction, Transform, Load (ETL) process.
This process starts with a mix of many data sources from business systems, then moves to a system that transforms the data into a readable state, and then finishes by generating a data mart with very structured and documented data types.
For the sake of applying this concept, we will mix the elements of this process with the final outcome of a structured dataset, which includes in its final form an additional label column (in the case of supervised learning problems).
This process is depicted in the following diagram:

The diagram illustrates the first stages of the data pipeline, starting with all the organization's data, whether it is commercial transactions, IoT device raw values, or other valuable data sources' information elements, which are commonly in very different types and compositions. The ETL process is in charge of gathering the raw information from them using different software filters, applying the necessary transforms to arrange the data in a useful manner, and finally, presenting the data in tabular format (we can think of this as a single database table with a last feature or result column, or a big CSV file with consolidated data). The final result can be conveniently used by the following processes without practically thinking of the many quirks of data formatting, because they have been standardized into a very clear table structure.
- ClickHouse性能之巔:從架構設計解讀性能之謎
- Qt 5 and OpenCV 4 Computer Vision Projects
- Mastering Objectoriented Python
- Java高并發核心編程(卷2):多線程、鎖、JMM、JUC、高并發設計模式
- Android 9 Development Cookbook(Third Edition)
- Java 9 Programming Blueprints
- aelf區塊鏈應用架構指南
- Visual C
- Java應用開發技術實例教程
- Java Web程序設計任務教程
- The DevOps 2.5 Toolkit
- Instant Ext.NET Application Development
- Python 3快速入門與實戰
- Python 快速入門(第3版)
- Learning TypeScript