官术网_书友最值得收藏!

Introduction

Simply put, Extract-Transform-Load (ETL) is an engine of any data warehouse. The nature of the ETL system is straightforward:

  • Extract data from operational databases/systems
  • Transform data according to the requirements of your data warehouse so that the different pieces of data can be used together
  • Apply data quality transformation methods in order to cleanse data and ensure that it is reliable before it gets loaded into a data warehouse
  • Load conformed data into a data warehouse so that end users can access it via reporting tools, using client applications directly, or with the help of SQL-based query tools

While your data warehouse delivery structures or data marts represent the frontend or, in other words, what users see when they access the data, the ETL system itself is a backbone backend solution that does all the work of moving data and getting it ready in time for users to use. Building the ETL system can be a really challenging task, and though it is not part of the data warehouse data structures, it is definitely the key factor in defining the success of the data warehouse solution as a whole. In the end, who wants to use a data warehouse where the data is unreliable, corrupted, or sometimes even missing? This is exactly what ETL is responsible for getting right.

The following data structure types most often used in ETL development to move data between sources and targets are flat files, XML datasets, and DBMS tables, both in normalized schemas and dimensional data models. When choosing an ETL solution, you might face two simple choices: building a handcoded ETL solution or using a commercial one.

The following are some advantages of a handcoded ETL solution:

  • A programming language allows you to build your own sophisticated transformations
  • You are more flexible in building the ETL architecture as you are not limited by the vendor's ETL abilities
  • Sometimes, it can be a cheap way of building a few simplistic ETL processes, whereas buying an ETL solution from a vendor can be overkill
  • You do not have to spend time learning the commercial ETL solution's architecture and functionality

Here are some advantages of a commercial ETL solution:

  • This is more often a simpler, faster, and cheaper development option as a variety of existing tools allow you to build a very sophisticated ETL architecture quickly
  • You do not have to be a professional programmer to use the tool
  • It automatically manages ETL metadata by collecting, storing, and presenting it to the ETL developer, which is another important aspect of any ETL solution
  • It has a huge range of additional ready-to-use functionality, from built-in schedulers to various connectors to existing systems, built-in data lineages, impact analysis reports, and many others

In the majority of DWH projects, the commercial ETL solution from a specific vendor, in spite of the higher immediate cost, eventually saves you a significant amount of money on the development and maintenance of ETL code.

SAP Data Services is an ETL solution provided by SAP and is part of the Enterprise Information Management product stack, which also includes SAP Information Steward; we will review this in one of the last chapters of this book.

主站蜘蛛池模板: 中卫市| 肥西县| 淳安县| 田东县| 庐江县| 宜良县| 文登市| 克什克腾旗| 钟祥市| 崇义县| 科尔| 仙居县| 陆川县| 乃东县| 监利县| 盖州市| 南京市| 无为县| 禄劝| 屏东市| 巴彦淖尔市| 建德市| 灵台县| 全州县| 时尚| 崇礼县| 长沙市| 雅安市| 固原市| 宣城市| 威海市| 化德县| 梁平县| 正宁县| 若尔盖县| 靖边县| 信宜市| 富平县| 丹寨县| 体育| 兴海县|