官术网_书友最值得收藏!

An introduction to data warehousing

Before actually getting started, there is an important concept that needs to be covered and that is data warehouse modeling. The data warehouse is a concept that has been around since the 1970s, it is a central database built for reporting that integrates data from disparate sources to a common location and a common schema. It also removes the direct connection between the source systems and the historical data. This means that if you change your source system, you will retain historical records and can decommission the old system.

The schema in a data warehouse in many cases is built in Third normal form (3NF) to ensure that data is only stored once to minimize the storage cost and make it easier to maintain.

Tip

For more information about 3NF, refer to http://en.wikipedia.org/wiki/3NF.

This is a strategy commonly referenced as an Inmon data warehouse coming from the father of data warehousing, Bill Inmon. Also have a look at http://inmoninstitute.com/about/index.

There is a drawback with 3NF, that is, it is not a model built for querying; it requires many joins in the queries to write the simplest report. So a simpler model of the data is often necessary. A common model built for querying is the dimensional model defined by Ralph Kimball and is available at http://www.kimballgroup.com/. The dimensional model defines those things that you want to measure and should be stored in a fact table. Around the fact table, you will have multiple dimension tables containing the things that you would like to slice the facts by.

In the preceding figure, you have the fact table defined containing each order row, you will have the measures such as order quantity and sales amount, as well as the keys referencing Product, Seller, Time, and Customer on each row of the table. The dimension tables contain attributes such as year, month, and day in the Time table, Customer name, address, and customer number in the Customer table. This design is often referenced to as a star schema.

Analysis Services is built with multidimensional modeling in mind and works best with data warehouses or data marts that use this technique; however, as you will later see, there is a possibility to work with all kinds of schemas in the database through the use of data source views in Analysis Services.

During the course of this book we will work with the AdventureWorks2012DW database. This is a data warehouse built for the fictitious company called Adventure Works Cycles. They have a data warehouse built using dimensional modeling with several fact tables containing the things that they want to measure in their business.

Tip

For a background on Adventure Works and their business, refer to the following description:

http://technet.microsoft.com/en-us/library/ms124825(v=SQL.100).aspx

An in-depth description of their data warehouse can be found at http://technet.microsoft.com/en-us/library/ms124623(v=sql.100).aspx.

主站蜘蛛池模板: 仙桃市| 阳江市| 利辛县| 渝北区| 库尔勒市| 怀柔区| 西峡县| 化德县| 辛集市| 西青区| 格尔木市| 麟游县| 玉溪市| 绵阳市| 即墨市| 苗栗县| 阿瓦提县| 清涧县| 社旗县| 伽师县| 永平县| 许昌县| 读书| 越西县| 茶陵县| 鄂托克前旗| 牙克石市| 伊通| 辉县市| 榆林市| 西城区| 海宁市| 鹿泉市| 井冈山市| 延川县| 邢台县| 普陀区| 鹤峰县| 大邑县| 临潭县| 景宁|