- Artificial Intelligence for Big Data
- Anand Deshpande Manish Kumar
- 459字
- 2021-06-25 21:57:08
The role Ontology plays in Big Data
As we saw in the introductory chapter, data volumes are growing at a phenomenal rate and in order to derive value from the data, it is impossible to model the entire data in a traditional Extract, Transform, and Load (ETL) way. Traditionally, data sources generate the datasets in structured and unstructured formats. In order to store these data assets, we need to manually model the data based on various entities. Taking an example of Person as an entity in the relational database world, we need to create a table that represents Person. This table is linked to various entities with foreign key relationships. However, these entities are predefined and have a fixed structure. There is manual effort involved in modeling the entities and it is difficult to modify them.
In the big data world, the schema is defined at read time instead of write time. This gives us a higher degree of flexibility with the entity structure and data modeling. Even with flexibility and extensible modeling capabilities, it is very difficult to manage the data assets on an internet scale if the entities are not standardized across domains.
In order to facilitate web search, Google introduced the knowledge graph which changed the search from keyword statistics based on representation to knowledge modeling.
This was the introduction of the searching by things and not strings paradigm. The knowledge graph is a very large Ontology which formally describes objects in the real world. With increased data assets generated from heterogeneous sources at an accelerating pace, we are constantly headed towards increased complexity. The big data paradigm describes large and complex datasets that are not manageable with traditional applications. At a minimum, we need a way to avoid false interpretations of complex data entities. The data integration and processing frameworks can possibly be improved with methods from the field of semantic technology. With use of things instead of text, we can improve information systems and their interoperability by identifying the context in which they exist. Ontologies provide the semantic richness of domain-specific knowledge and its representation.
With big data assets, it is imperative that we reduce the manual effort of modeling the data into information and knowledge. This is possible if we can create a means to find the correspondence between raw entities, derive the generic schema with taxonomical representation, and map the concepts to topics in specific knowledge domains with terminological similarities and structural mappings. This implementation will facilitate automatic support for the management of big data assets and the integration of different data sources, resulting in fewer errors and speed of knowledge derivation.
We need an automated progression from Glossary to Ontologies in the following manner:
- 數(shù)據(jù)分析實戰(zhàn):基于EXCEL和SPSS系列工具的實踐
- Word 2010中文版完全自學手冊
- Effective Amazon Machine Learning
- 大數(shù)據(jù)架構商業(yè)之路:從業(yè)務需求到技術方案
- 高維數(shù)據(jù)分析預處理技術
- SQL應用及誤區(qū)分析
- 編寫有效用例
- Python數(shù)據(jù)分析與數(shù)據(jù)化運營
- 活用數(shù)據(jù):驅動業(yè)務的數(shù)據(jù)分析實戰(zhàn)
- MySQL數(shù)據(jù)庫實用教程
- 信息融合中估計算法的性能評估
- 數(shù)據(jù)應用工程:方法論與實踐
- SQL Server 數(shù)據(jù)庫教程(2008版)
- Redis 6開發(fā)與實戰(zhàn)
- 工業(yè)大數(shù)據(jù)分析實踐