官术网_书友最值得收藏!

Sources of big data

Technology today allows us to collect data at an astounding rate--both in terms of volume and variety. There are various sources that generate data, but in the context of big data, the primary sources are as follows:

  • Social networks: Arguably, the primary source of all big data that we know of today is the social networks that have proliferated over the past 5-10 years. This is by and large unstructured data that is represented by millions of social media postings and other data that is generated on a second-by-second basis through user interactions on the web across the world. Increase in access to the internet across the world has been a self-fulfilling act for the growth of data in social networks.
  • Media: Largely a result of the growth of social networks, media represents the millions, if not billions, of audio and visual uploads that take place on a daily basis. Videos uploaded on YouTube, music recordings on SoundCloud, and pictures posted on Instagram are prime examples of media, whose volume continues to grow in an unrestrained manner.
  • Data warehouses: Companies have long invested in specialized data storage facilities commonly known as data warehouses. A DW is essentially collections of historical data that companies wish to maintain and catalog for easy retrieval, whether for internal use or regulatory purposes. As industries gradually shift toward the practice of storing data in platforms such as Hadoop and NoSQL, more and more companies are moving data from their pre-existing data warehouses to some of the newer technologies. Company emails, accounting records, databases, and internal documents are some examples of DW data that is now being offloaded onto Hadoop or Hadoop-like platforms that leverage multiple nodes to provide a highly-available and fault-tolerant platform.
  • Sensors: A more recent phenomenon in the space of big data has been the collection of data from sensor devices. While sensors have always existed and industries such as oil and gas have been using drilling sensors for measurements at oil rigs for many decades, the advent of wearable devices, also known as the Internet Of Things such as Fitbit and Apple Watch, meant that now each inpidual could stream data at the same rate at which a few oil rigs used to do just 10 years back.

Wearable devices can collect hundreds of measurements from an inpidual at any given point in time. While not yet a big data problem as such, as the industry keeps evolving, sensor-related data is likely to become more akin to the kind of spontaneous data that is generated on the web through social network activities.

主站蜘蛛池模板: 鹤壁市| 东至县| 长白| 汶川县| 荔波县| 东兴市| 平顶山市| 民县| 阿瓦提县| 麻城市| 东丽区| 绥宁县| 河间市| 咸丰县| 朝阳区| 铅山县| 安泽县| 县级市| 莱州市| 板桥市| 泗阳县| 邹平县| 莱阳市| 五峰| 西藏| 高淳县| 莲花县| 胶南市| 明溪县| 东乡县| 重庆市| 开平市| 芦溪县| 江源县| 宁城县| 澎湖县| 志丹县| 临沧市| 绥化市| 沙田区| 海淀区|