官术网_书友最值得收藏!

The world of Big Data

Since the last decade, the amount of data being created is more than 20 terabytes per second and this size is only increasing. Not only volume and velocity but this data is also of a different variety, that is, structured and semi structured in nature, which means that data might be coming from blog posts, tweets, social network interactions, photos, videos, continuously generated log messages about what users are doing, and so on. Hence, Big Data is a combination of transactional data and interactive data. This large set of data is further used by organizations for decision making. Storing, analyzing, and summarizing these large datasets efficiently and cost effectively have become among the biggest challenges for these organizations.

In 2003, Google published a paper on the scalable distributed filesystem titled Google File System (GFS), which uses a cluster of commodity hardware to store huge amounts of data and ensure high availability by using the replication of data between nodes. Later, Google published an additional paper on processing large, distributed datasets using MapReduce (MR).

For processing Big Data, platforms such as Hadoop, which inherits the basics from both GFS and MR, were developed and contributed to the community. A Hadoop-based platform is able to store and process continuously growing data in terabytes or petabytes.

Note

The Apache Hadoop software library is a framework that allows the distributed processing of large datasets across clusters of computers.

However, Hadoop is designed to process data in the batch mode and the ability to access data randomly and near real time is completely missing. In Hadoop, processing smaller files has a larger overhead compared to big files and thus is a bad choice for low latency queries.

Later, a database solution called NoSQL evolved with multiple flavors, such as a key-value store, document-based store, column-based store, and graph-based store. NoSQL databases are suitable for different business requirements. Not only do these different flavors address scalability and availability but also take care of highly efficient read/write with data growing infinitely or, in short, Big Data.

Note

The NoSQL database provides a fail-safe mechanism for the storage and retrieval of data that is modeled in it, somewhat different from the tabular relations used in many relational databases.

主站蜘蛛池模板: 碌曲县| 广西| 德安县| 栾川县| 中江县| 东安县| 囊谦县| 循化| 军事| 商水县| 中山市| 江永县| 安宁市| 城固县| 会泽县| 云浮市| 定南县| 苏州市| 张家口市| 阿荣旗| 修水县| 密山市| 昌江| SHOW| 西藏| 丹东市| 若羌县| 永宁县| 江山市| 司法| 黄山市| 大理市| 泊头市| 洪江市| 新营市| 伊宁县| 灵璧县| 平凉市| 大冶市| 庄浪县| 东山县|