官术网_书友最值得收藏!

Introduction

Now, let's go for the real stuff! This chapter will cover many topics that will lay out the foundations of a simple and effective ETL solution. Over the years, I have seen many SSIS implementations and one of the goals of this chapter is to give the readers the following:

  • A simple but effective SSIS framework
  • SSIS development best practices
  • New data source integrations

All remaining chapters assume that we want to load a data warehouse that is a star schema with its staging area.

The source (operational) database used is AdventureWorksLT, an old well-known database. The following diagram describes the source database that we're going to use:

From this database, we'll insert data in a staging area and finally into a data warehouse. The staging area and the data warehouse will be separated in schemas in a database that we'll manage using SSDT.

The following diagram is the representation that describes the staging schema of the AdventureWorksLTDW database:

Not all tables are copied in the staging area and three sections have been identified:

  • Orders: These tables contain order information as well as dates related information. In the data warehouse section, these sections have their own tables.
  • Customer: These tables contain information related to a customer and their addresses. In the data warehouse, these tables are grouped.
  • Product: These tables contain the product information such as model, description in multiple languages, and so on. Like customer data, these tables are grouped in the data warehouse.

Once in the staging area, the data will be copied into a star schema database representation. The tables are in the same database, but have a separate schema called DW. The following diagram shows the DW tables:

The goal of the data warehouse is to ease data consumptions. It's easy to understand by most users and data is categorized into areas (tables) that represent the subjects that the end users will base their analysis on.

The customer information has been regrouped into two tables, DimCustomer and DimAddress. The DimAddress table has two links to the fact table. These relationships represent the multiple addresses, two in our case: the billing and shipping address of the customer.

The product information has been flattened into one dimension: DimProduct. Although the base model allowed for more than one language when it comes to the product descriptions, only two are retained in the dimension: French and English - EnglishDesctiption and FrenchDescription.

The orders tables have been merged into one fact table: FactOrders. The DimTime dimension has been added to allow better querying of the orders using various dates: order, shipped, and due dates. The SalesOrderNumber and PurchaseOrderNumber are considered derived dimensions and stay in the fact table. We don't have enough information that can be derived from these columns and they are strongly tied to the facts.

The remaining dimension, DimOrderProvenance, has no source in the AdventureWorksLT operational database. It has been added and is managed by Master Data Services, another service that comes with SQL Server 2016 Developer Edition. We'll talk about this service and this dimension later in this book.

In the next few recipes, we'll deploy these databases and the ETL's (SSIS packages) that load these tables.

主站蜘蛛池模板: 成都市| 历史| 徐闻县| 合水县| 平度市| 青岛市| 泰来县| 望都县| 天水市| 中山市| 沈丘县| 阿尔山市| 察雅县| 鸡东县| 芒康县| 荔浦县| 枣强县| 武汉市| 工布江达县| 卢氏县| 江津市| 正安县| 明水县| 祁阳县| 神农架林区| 常山县| 惠来县| 松桃| 宝丰县| 洪江市| 云南省| 武清区| 永胜县| 彝良县| 永靖县| 德昌县| 寿宁县| 武陟县| 黄陵县| 怀集县| 石城县|