官术网_书友最值得收藏!

The Big Data dimensional paradigm

To begin with, in simple terms, Big Data helps us deal with the three Vs: volume, velocity, and variety. Recently, two more Vs—veracity and value—were added to it, making it a five-dimensional paradigm:

  • Volume: This dimension refers to the amount of data. Look around you; huge amounts of data are being generated every second—it may be the e-mail you send, Twitter, Facebook, other social media, or it can just be all the videos, pictures, SMS, call records, or data from various devices and sensors. We have scaled up the data measuring metrics to terabytes, zettabytes and vronobytes—they are all humongous figures. Look at Facebook, it has around 10 billion messages each day; consolidated across all users, we have nearly 5 billion "likes" a day; and around 400 million photographs are uploaded each day. Data statistics, in terms of volume, are startling; all the data generated from the beginning of time to 2008 is kind of equivalent to what we generate in a day today, and I am sure soon it will be an hour. This volume aspect alone is making the traditional database unable to store and process this amount of data in a reasonable and useful time frame, though a Big Data stack can be employed to store, process, and compute amazingly large datasets in a cost-effective, distributed, and reliably efficient manner.
  • Velocity: This refers to the data generation speed, or the rate at which data is being generated. In today's world, where the volume of data has made a tremendous surge, this aspect is not lagging behind. We have loads of data because we are generating it so fast. Look at social media; things are circulated in seconds and they become viral, and the insight from social media is analyzed in milliseconds by stock traders and that can trigger lot of activity in terms of buying or selling. At target point of sale counters, it takes a few seconds for a credit card swipe and, within that, fraudulent transaction processing, payment, bookkeeping, and acknowledgement are all done. Big Data gives me power to analyze the data at tremendous speed.
  • Variety: This dimension tackles the fact that the data can be unstructured. In the traditional database world, and even before that, we were used to a very structured form of data that kind of neatly fitted into the tables. But today, more than 80 percent of data is unstructured; for example, photos, video clips, social media updates, data from a variety of sensors, voice recordings, and chat conversations. Big Data lets you store and process this unstructured data in a very structured manner; in fact, it embraces the variety.
  • Veracity: This is all about validity and the correctness of data. How accurate and usable is the data? Not everything out of millions and zillions of data records is corrected, accurate, and referable. That's what veracity actually is: how trustworthy the data is, and what the quality of data is. Two examples of data with veracity are Facebook and Twitter posts with nonstandard acronyms or typos. Big Data has brought to the table the ability to run analytics on this kind of data. One of the strong reasons for the volume of data is its veracity.
  • Value: As the name suggests, this is the value the data actually holds. Unarguably, it's the most important V or dimension of Big Data. The only motivation for going towards Big Data for the processing of super-large datasets is to derive some valuable insight from it; in the end, it's all about cost and benefits.
主站蜘蛛池模板: 侯马市| 武穴市| 花莲县| 上林县| 房产| 湖北省| 科技| 荥阳市| 东莞市| 民丰县| 巢湖市| 鹤庆县| 都江堰市| 通化市| 抚州市| 临朐县| 渝中区| 莎车县| 渭源县| 澄江县| 景泰县| 天柱县| 广安市| 城口县| 黄梅县| 仁化县| 磐安县| 灵石县| 温泉县| 乐昌市| 云南省| 微山县| 沙田区| 新郑市| 江源县| 东兰县| 闻喜县| 永顺县| 盐津县| 准格尔旗| 邻水|