官术网_书友最值得收藏!

Data integration

In a general scenario, data comes from different sources. Data integration is one of the techniques of combining data from disparate sources and providing end users with a unified view of that data. This gives a sense of abstraction to the end users. 

Mathematically, data integration systems are formally defined as a <G, S, M>:

  • G is the global schema
  • S is the heterogeneous set of source schemas
  • M is the mapping that maps queries between the source and the global schemas

Both G and S are expressed in languages over alphabets composed of symbols for each of their respective relations. The mapping M consists of assertions between queries over G and queries over S.

There are a few other big data management capabilities; they can be explained as follows:

  • Data migration: This is the process of transferring data from one environment to another. Most migration occurs between computers and storage devices (for example, transferring data from in-house data centers to the cloud). 
  • Data preparation: Data that is used for analysis is often messy and inconsistent, and not standardized. This data must be collected and cleaned into one file or data table, before an actual analysis can take place. This step is referred to as data preparation. It involves handling messy data, trying to combine data from multiple sources, and reporting on the data sources that were entered manually. 
  • Data enrichment: This step involves enhancing the existing set of data by refining the data, in order to improve its quality. It can be done in several ways. Some common ways are by adding new datasets, correcting miniature errors, or extrapolating new information from raw data.
  • Data analytics: This is the process of drawing insights from datasets by analyzing them with a variety of algorithms. Most steps are automated by using various tools.
  • Data quality: This is the act of confirming that the data is accurate and reliable. There are several ways in which data quality is controlled. 
  • Master data management (MDM): This is a method that is used to define and manage the important data of any enterprise, in order to facilitate the process of linking critical enterprise data to one master set. The master set works as a single source of truth for the organization. 
  • Data governance: This is a data management concept that deals with the ability of a company to ensure high data quality throughout the analytical process. This process includes warranting the availability, usability, integrity, and accuracy of data.
  • Extract transform load (ETL): As the name implies, this is the process of moving data from an existing repository to a different database, or a new data warehouse.
主站蜘蛛池模板: 黄大仙区| 伊宁市| 北辰区| 渑池县| 利津县| 高雄市| 伊春市| 黄冈市| 喀什市| 南陵县| 张家界市| 卢龙县| 郯城县| 民勤县| 黎城县| 翼城县| 星子县| 普宁市| 中宁县| 安新县| 福鼎市| 遂宁市| 德庆县| 天柱县| 淳化县| 延长县| 石屏县| 铜川市| 富平县| 宁津县| 安化县| 逊克县| 怀安县| 西畴县| 潜山县| 连南| 苍溪县| 凉城县| 元谋县| 岢岚县| 驻马店市|