- SQL Server 2017 Integration Services Cookbook
- Christian Cote Matija Lah Dejan Sarka
- 530字
- 2021-07-02 20:41:36
Introduction
Now, let's go for the real stuff! This chapter will cover many topics that will lay out the foundations of a simple and effective ETL solution. Over the years, I have seen many SSIS implementations and one of the goals of this chapter is to give the readers the following:
- A simple but effective SSIS framework
- SSIS development best practices
- New data source integrations
All remaining chapters assume that we want to load a data warehouse that is a star schema with its staging area.
The source (operational) database used is AdventureWorksLT, an old well-known database. The following diagram describes the source database that we're going to use:

From this database, we'll insert data in a staging area and finally into a data warehouse. The staging area and the data warehouse will be separated in schemas in a database that we'll manage using SSDT.
The following diagram is the representation that describes the staging schema of the AdventureWorksLTDW database:

Not all tables are copied in the staging area and three sections have been identified:
- Orders: These tables contain order information as well as dates related information. In the data warehouse section, these sections have their own tables.
- Customer: These tables contain information related to a customer and their addresses. In the data warehouse, these tables are grouped.
- Product: These tables contain the product information such as model, description in multiple languages, and so on. Like customer data, these tables are grouped in the data warehouse.
Once in the staging area, the data will be copied into a star schema database representation. The tables are in the same database, but have a separate schema called DW. The following diagram shows the DW tables:

The goal of the data warehouse is to ease data consumptions. It's easy to understand by most users and data is categorized into areas (tables) that represent the subjects that the end users will base their analysis on.
The customer information has been regrouped into two tables, DimCustomer and DimAddress. The DimAddress table has two links to the fact table. These relationships represent the multiple addresses, two in our case: the billing and shipping address of the customer.
The product information has been flattened into one dimension: DimProduct. Although the base model allowed for more than one language when it comes to the product descriptions, only two are retained in the dimension: French and English - EnglishDesctiption and FrenchDescription.
The orders tables have been merged into one fact table: FactOrders. The DimTime dimension has been added to allow better querying of the orders using various dates: order, shipped, and due dates. The SalesOrderNumber and PurchaseOrderNumber are considered derived dimensions and stay in the fact table. We don't have enough information that can be derived from these columns and they are strongly tied to the facts.
The remaining dimension, DimOrderProvenance, has no source in the AdventureWorksLT operational database. It has been added and is managed by Master Data Services, another service that comes with SQL Server 2016 Developer Edition. We'll talk about this service and this dimension later in this book.
In the next few recipes, we'll deploy these databases and the ETL's (SSIS packages) that load these tables.
- C語言程序設計(第3版)
- 圖解Java數據結構與算法(微課視頻版)
- 神經網絡編程實戰:Java語言實現(原書第2版)
- Production Ready OpenStack:Recipes for Successful Environments
- Data Analysis with IBM SPSS Statistics
- Cassandra Data Modeling and Analysis
- Scala編程實戰(原書第2版)
- Python時間序列預測
- Python機器學習基礎教程
- 劍指大數據:企業級數據倉庫項目實戰(在線教育版)
- Android驅動開發權威指南
- 深入分析GCC
- Python網絡爬蟲實例教程(視頻講解版)
- C# 7.1 and .NET Core 2.0:Modern Cross-Platform Development(Third Edition)
- 軟件開發中的決策:權衡與取舍