官术网_书友最值得收藏!

Data Transforming and Cleaning with T-SQL

`Data comes from a wide range of sources. It can be relational or non-relational, the connectivity can be unstable, and there are also many other issues when data has to be extracted from data sources. This is why developers, statisticians, and data scientists should never entirely believe in the quality of the source data. This chapter explains the techniques for data transformation and cleansing using Transact-SQL (T-SQL) language.

The following topics will be covered in this chapter:

  • The need for data transformation: This section presents the main goal of data transformation for data science purposes and, using examples, also provides several cases of what could happen to incoming data.
  • Database architectures for data transformations: Data transformations can vary from very simple to very complex. That's why it's necessary to find the right architecture to find the most reliable set of transform tasks.
  • Transforming data: This includes accuracy checks, deduplication, high-watermark for incremental loads, and so on. There are also many other actions that could be seen as transformations.
  • Denormalizing data: As a lot of data comes from relational databases, its format is strongly normalized. Denormalization is a part of data transformation, which is useful for fitting data better for analytical purposes.
  • Using views and stored procedures: Views and stored procedures are very common database objects. This is the same when these objects are used for data transformations.
  • Performance considerations: It would not be feasible to transform data longer than the analysis itself is executed. Another aspect of performance is the impact on source systems. That's why it's very important to be aware of data transformation performance.
主站蜘蛛池模板: 滦平县| 万州区| 深圳市| 黄浦区| 兖州市| 武清区| 砀山县| 五河县| 龙江县| 成都市| 荃湾区| 大邑县| 巢湖市| 昭通市| 莎车县| 芒康县| 满城县| 黄石市| 蓝山县| 闽侯县| 舒兰市| 彰化市| 大同县| 曲阳县| 桑植县| 当涂县| 商水县| 利津县| 大厂| 昌江| 阿勒泰市| 平乡县| 黔南| 高雄市| 榆树市| 岳池县| 增城市| 台北县| 崇州市| 临夏县| 利津县|