- Machine Learning in Java
- AshishSingh Bhatia Bostjan Kaluza
- 203字
- 2021-06-10 19:30:09
Big data application architecture
Big data, such as documents, web blogs, social networks, sensor data, and others, are stored in a NoSQL database, such as MongoDB, or a distributed filesystem, such as HDFS. In case we deal with structured data, we can deploy database capabilities using systems such as Cassandra or HBase, which are built atop Hadoop. Data processing follows the MapReduce paradigm, which breaks data processing problems into smaller sub problems and distributes tasks across processing nodes. Machine learning models are finally trained with machine learning libraries such as Mahout and Spark.
MongoDB is a NoSQL database, which stores documents in a JSON-like format. You can read more about it at https://www.mongodb.org . Hadoop is a framework for the distributed processing of large datasets across a cluster of computers. It includes its own filesystem format, HDFS, job scheduling framework, YARD, and implements the MapReduce approach for parallel data processing. We can learn more about Hadoop at http://hadoop.apache.org/ . Cassandra is a distributed database management system that was built to provide fault-tolerant, scalable, and decentralized storage. More information is available at http://cassandra.apache.org/ . HBase is another database that focuses on random read/write access for distributed storage. More information is available at https://hbase.apache.org/ .
推薦閱讀
- 零起步輕松學單片機技術(第2版)
- ABB工業機器人編程全集
- 21天學通PHP
- Python Artificial Intelligence Projects for Beginners
- TIBCO Spotfire:A Comprehensive Primer(Second Edition)
- Hands-On Data Science with SQL Server 2017
- Visual C# 2008開發技術詳解
- Visual Basic從初學到精通
- 可編程控制器技術應用(西門子S7系列)
- 大學計算機應用基礎
- OpenStack Cloud Computing Cookbook(Second Edition)
- Spark大數據技術與應用
- 完全掌握AutoCAD 2008中文版:機械篇
- 網站入侵與腳本攻防修煉
- 電子設備及系統人機工程設計(第2版)