官术网_书友最值得收藏!

Data ingestion

Data ingestion refers to the process of procuring data into the system. This can be done via manual, semi-automatic, or automatic methods. 

Data ingestion means the process of getting the data into the data system that we are building or using.

In a smaller system, users prefer to have some kind of web form or visual interface that takes input in order to put the data into the system. However, when it comes to a larger system, such as a hospital management system, an airline management system, a government and public record management system, or a social media site, users often prefer to automate the data ingestion process as much as possible. So, when it comes to data ingestion, we need to explore a bunch of questions, such as the following:

  • How many data sources are there?
  • How many large data items are available?
  • Will the number of data sources grow over time?
  • What is the rate at which data will be consumed?

It is quite important to note that the size of an individual record is small, but the volume of data is quite enormous. When it comes to data ingestion, developers like to create a bunch of policies, called ingestion policies, that guide the handling of errors during the data ingestion, as well as the data incompleteness, and so on. Data ingestion (along with its policies) is an integral part of a big data system.

主站蜘蛛池模板: 娄底市| 诏安县| 个旧市| 读书| 肇庆市| 诸暨市| 乐都县| 靖安县| 高唐县| 宁明县| 来宾市| 抚州市| 米泉市| 山东省| 丹巴县| 潜山县| 吉安县| 临城县| 民乐县| 安塞县| 宁阳县| 镇坪县| 新邵县| 洪洞县| 濮阳市| 静安区| 天柱县| 自贡市| 陵川县| 阳曲县| 洛扎县| 广丰县| 兴隆县| 仙居县| 大姚县| 白朗县| 滦平县| 陵川县| 宣化县| 商水县| 洛南县|