官术网_书友最值得收藏!

The world of Big Data

Since the last decade, the amount of data being created is more than 20 terabytes per second and this size is only increasing. Not only volume and velocity but this data is also of a different variety, that is, structured and semi structured in nature, which means that data might be coming from blog posts, tweets, social network interactions, photos, videos, continuously generated log messages about what users are doing, and so on. Hence, Big Data is a combination of transactional data and interactive data. This large set of data is further used by organizations for decision making. Storing, analyzing, and summarizing these large datasets efficiently and cost effectively have become among the biggest challenges for these organizations.

In 2003, Google published a paper on the scalable distributed filesystem titled Google File System (GFS), which uses a cluster of commodity hardware to store huge amounts of data and ensure high availability by using the replication of data between nodes. Later, Google published an additional paper on processing large, distributed datasets using MapReduce (MR).

For processing Big Data, platforms such as Hadoop, which inherits the basics from both GFS and MR, were developed and contributed to the community. A Hadoop-based platform is able to store and process continuously growing data in terabytes or petabytes.

Note

The Apache Hadoop software library is a framework that allows the distributed processing of large datasets across clusters of computers.

However, Hadoop is designed to process data in the batch mode and the ability to access data randomly and near real time is completely missing. In Hadoop, processing smaller files has a larger overhead compared to big files and thus is a bad choice for low latency queries.

Later, a database solution called NoSQL evolved with multiple flavors, such as a key-value store, document-based store, column-based store, and graph-based store. NoSQL databases are suitable for different business requirements. Not only do these different flavors address scalability and availability but also take care of highly efficient read/write with data growing infinitely or, in short, Big Data.

Note

The NoSQL database provides a fail-safe mechanism for the storage and retrieval of data that is modeled in it, somewhat different from the tabular relations used in many relational databases.

主站蜘蛛池模板: 油尖旺区| 石台县| 葫芦岛市| 富蕴县| 桦川县| 黄龙县| 广州市| 上杭县| 汕尾市| 濉溪县| 黑河市| 板桥市| 正阳县| 临颍县| 吉安县| 托克逊县| 奉节县| 蕉岭县| 三亚市| 泌阳县| 盱眙县| 潮安县| 黑河市| 福海县| 政和县| 金川县| 景宁| 长治县| 长宁区| 淮南市| 湟源县| 海兴县| 邯郸市| 云阳县| 蒲城县| 东台市| 丰镇市| 邯郸县| 阿拉尔市| 古丈县| 社会|