官术网_书友最值得收藏!

What is big data?

Big data is a relatively new term which has been gathering steam over the past few years. Big data is a term used for datasets that are relatively large to be stored in a traditional database system or processed by traditional data-processing pipelines. This data could be structured, semi-structured, or unstructured data. The datasets that belong to this category usually scale to terabytes or petabytes of data. Big data usually involves one or more of the following:

  • Velocity: Data moves at an unprecedented speed and must be dealt with it in a timely manner.

For example, online systems, sensors, social media, web clickstream, and so on.

  • Volume: Organizations collect data from a variety of sources, including business transactions, social media, and information from sensor or machine-to-machine data. This could involve terabytes to petabytes of data. In the past, storing it would've been a problem, but new technologies have eased the burden.
  • Variety: Data comes in all sorts of formats ranging from structured data to be stored in traditional databases to unstructured data (blobs) such as images, audio files, and text files.

These are known as the 3Vs of big data.

In addition to these, we tend to associate another term with big data:

  • Complexity: Today's data comes from multiple sources, which makes it difficult to link, match, cleanse, and transform data across systems. However, it's necessary to connect and correlate relationships, hierarchies, and multiple data linkages, or your data can quickly spiral out of control. It must be able to traverse multiple data centers, cloud, and geographical zones.
主站蜘蛛池模板: 靖州| 桃园县| 廊坊市| 武威市| 交城县| 常山县| 新干县| 红河县| 潼关县| 余庆县| 正定县| 辽中县| 郯城县| 红原县| 波密县| 凭祥市| 丹寨县| 红原县| 红原县| 从化市| 镶黄旗| 安丘市| 泗洪县| 瑞昌市| 绥化市| 吉林省| 永年县| 三亚市| 吕梁市| 临夏市| 梧州市| 东明县| 和平区| 南充市| 西丰县| 托里县| 永州市| 南京市| 遂宁市| 桓台县| 筠连县|