- HBase Essentials
- Nishant Garg
- 375字
- 2021-08-05 17:24:19
The world of Big Data
Since the last decade, the amount of data being created is more than 20 terabytes per second and this size is only increasing. Not only volume and velocity but this data is also of a different variety, that is, structured and semi structured in nature, which means that data might be coming from blog posts, tweets, social network interactions, photos, videos, continuously generated log messages about what users are doing, and so on. Hence, Big Data is a combination of transactional data and interactive data. This large set of data is further used by organizations for decision making. Storing, analyzing, and summarizing these large datasets efficiently and cost effectively have become among the biggest challenges for these organizations.
In 2003, Google published a paper on the scalable distributed filesystem titled Google File System (GFS), which uses a cluster of commodity hardware to store huge amounts of data and ensure high availability by using the replication of data between nodes. Later, Google published an additional paper on processing large, distributed datasets using MapReduce (MR).
For processing Big Data, platforms such as Hadoop, which inherits the basics from both GFS and MR, were developed and contributed to the community. A Hadoop-based platform is able to store and process continuously growing data in terabytes or petabytes.
Note
The Apache Hadoop software library is a framework that allows the distributed processing of large datasets across clusters of computers.
However, Hadoop is designed to process data in the batch mode and the ability to access data randomly and near real time is completely missing. In Hadoop, processing smaller files has a larger overhead compared to big files and thus is a bad choice for low latency queries.
Later, a database solution called NoSQL evolved with multiple flavors, such as a key-value store, document-based store, column-based store, and graph-based store. NoSQL databases are suitable for different business requirements. Not only do these different flavors address scalability and availability but also take care of highly efficient read/write with data growing infinitely or, in short, Big Data.
Note
The NoSQL database provides a fail-safe mechanism for the storage and retrieval of data that is modeled in it, somewhat different from the tabular relations used in many relational databases.
- Design for the Future
- 樂高機器人EV3設計指南:創造者的搭建邏輯
- VMware Performance and Capacity Management(Second Edition)
- 城市道路交通主動控制技術
- 樂高創意機器人教程(中級 下冊 10~16歲) (青少年iCAN+創新創意實踐指導叢書)
- Photoshop CS3圖層、通道、蒙版深度剖析寶典
- JavaScript典型應用與最佳實踐
- Nginx高性能Web服務器詳解
- 單片機C語言應用100例
- 嵌入式操作系統原理及應用
- 自動化生產線安裝與調試(三菱FX系列)(第二版)
- 基于ARM9的小型機器人制作
- 三菱FX/Q系列PLC工程實例詳解
- Linux Shell Scripting Cookbook(Third Edition)
- 案例解說Delphi典型控制應用