MapR Hadoop distribution

MapR is one of the initial companies that started working on their own Hadoop distribution. When it comes to a Hadoop distribution, MapR has gone one step further and replaced HDFS of Hadoop with its own proprietary filesystem called MapRFS. MapRFS is a filesystem that supports enterprise-grade features such as better data management, fault tolerance, and ease of use. One key differentiator between HDFS and MapRFS is that MapRFS allows random writes on its filesystem. Additionally, unlike HDFS, it can be mounted locally through NFS to any filesystem. MapR implements POSIX (HDFS has POSIX-like implementation), so any Linux developer can apply their knowledge to run different commands seamlessly. MapR-like filesystems can be utilized for OLTP-like business requirements due to its unique features.

Pros of the MapR Hadoop distribution include the following:

It's the only Hadoop distribution without Java dependencies (as MapR is based on C)
Offers excellent and production-ready Hadoop clusters
MapRFS is easy to use and it provides multi-node FS access on a local NFS mounted

Cons of the MapR Hadoop distribution include the following:

It gets more and more proprietary instead of open source. Many companies are looking for vendor-free development, so MapR does not fit there.

Each of the distributions, including open source, that we covered have unique business strategy and features. Choosing the right Hadoop distribution for a problem is driven by multiple factors such as the following:

What kind of application needs to be addressed by Hadoop
The type of application—transactional or analytical—and what are the key data processing requirements
Investments and the timeline of project implementation
Support and training requirements of a given project

官术网_书友最值得收藏!

Apache Hadoop 3 Quick Start Guide

MapR Hadoop distribution