官术网_书友最值得收藏!

Big data application architecture

Big data, such as documents, web blogs, social networks, sensor data, and others, are stored in a NoSQL database, such as MongoDB, or a distributed filesystem, such as HDFS. In case we deal with structured data, we can deploy database capabilities using systems such as Cassandra or HBase, which are built atop Hadoop. Data processing follows the MapReduce paradigm, which breaks data processing problems into smaller sub problems and distributes tasks across processing nodes. Machine learning models are finally trained with machine learning libraries such as Mahout and Spark.

MongoDB is a NoSQL database, which stores documents in a JSON-like format. You can read more about it at  https://www.mongodb.org . Hadoop is a framework for the distributed processing of large datasets across a cluster of computers. It includes its own filesystem format, HDFS, job scheduling framework, YARD, and implements the MapReduce approach for parallel data processing. We can learn more about Hadoop at  http://hadoop.apache.org/ . Cassandra is a distributed database management system that was built to provide fault-tolerant, scalable, and decentralized storage. More information is available at  http://cassandra.apache.org/ . HBase is another database that focuses on random read/write access for distributed storage. More information is available at  https://hbase.apache.org/ .
主站蜘蛛池模板: 油尖旺区| 抚州市| 日土县| 高唐县| 双流县| 汕尾市| 炎陵县| 商水县| 鄢陵县| 宜春市| 伊金霍洛旗| 巫溪县| 无为县| 三河市| 沙雅县| 黄冈市| 元江| 扎鲁特旗| 鄂温| 遂平县| 渭源县| 瑞丽市| 宜春市| 青海省| 马公市| 望都县| 郴州市| 门源| 磴口县| 江都市| 公安县| 湖北省| 藁城市| 鹿泉市| 三原县| 陆良县| 翁源县| 罗田县| 兴山县| 天镇县| 临泽县|