官术网_书友最值得收藏!

Introduction

Now, let's go for the real stuff! This chapter will cover many topics that will lay out the foundations of a simple and effective ETL solution. Over the years, I have seen many SSIS implementations and one of the goals of this chapter is to give the readers the following:

  • A simple but effective SSIS framework
  • SSIS development best practices
  • New data source integrations

All remaining chapters assume that we want to load a data warehouse that is a star schema with its staging area.

The source (operational) database used is AdventureWorksLT, an old well-known database. The following diagram describes the source database that we're going to use:

From this database, we'll insert data in a staging area and finally into a data warehouse. The staging area and the data warehouse will be separated in schemas in a database that we'll manage using SSDT.

The following diagram is the representation that describes the staging schema of the AdventureWorksLTDW database:

Not all tables are copied in the staging area and three sections have been identified:

  • Orders: These tables contain order information as well as dates related information. In the data warehouse section, these sections have their own tables.
  • Customer: These tables contain information related to a customer and their addresses. In the data warehouse, these tables are grouped.
  • Product: These tables contain the product information such as model, description in multiple languages, and so on. Like customer data, these tables are grouped in the data warehouse.

Once in the staging area, the data will be copied into a star schema database representation. The tables are in the same database, but have a separate schema called DW. The following diagram shows the DW tables:

The goal of the data warehouse is to ease data consumptions. It's easy to understand by most users and data is categorized into areas (tables) that represent the subjects that the end users will base their analysis on.

The customer information has been regrouped into two tables, DimCustomer and DimAddress. The DimAddress table has two links to the fact table. These relationships represent the multiple addresses, two in our case: the billing and shipping address of the customer.

The product information has been flattened into one dimension: DimProduct. Although the base model allowed for more than one language when it comes to the product descriptions, only two are retained in the dimension: French and English - EnglishDesctiption and FrenchDescription.

The orders tables have been merged into one fact table: FactOrders. The DimTime dimension has been added to allow better querying of the orders using various dates: order, shipped, and due dates. The SalesOrderNumber and PurchaseOrderNumber are considered derived dimensions and stay in the fact table. We don't have enough information that can be derived from these columns and they are strongly tied to the facts.

The remaining dimension, DimOrderProvenance, has no source in the AdventureWorksLT operational database. It has been added and is managed by Master Data Services, another service that comes with SQL Server 2016 Developer Edition. We'll talk about this service and this dimension later in this book.

In the next few recipes, we'll deploy these databases and the ETL's (SSIS packages) that load these tables.

主站蜘蛛池模板: 上蔡县| 肥东县| 安国市| 深州市| 乌什县| 瑞昌市| 万宁市| 偏关县| 揭东县| 中西区| 大化| 启东市| 阳东县| 南郑县| 包头市| 鸡东县| 泰宁县| 原阳县| 桂林市| 连南| 手游| 鄂伦春自治旗| 灵川县| 古蔺县| 临夏市| 德安县| 海兴县| 云龙县| 甘肃省| 丹巴县| 大悟县| 中方县| 镇江市| 克东县| 惠安县| 延寿县| 澄城县| 神农架林区| 屏山县| 彭泽县| 曲松县|