In Free Open Source market,
Redhat is making money by taking Unix/Linux Core Kernel (an open source
operating system) bundle all its required components, building a simple
installer, and providing paid support to any customers.
In the same way, there are many
companies which are providing enterprise editions and paid support on top of
apache Hadoop distribution.
Free Open Source Hadoop Distribution
·
Apache Hadoop
o Core Hadoop Distribution Used by
all other distributions
o Complex Cluster Setup but No
Commercial Support
o Manual Installation and
Integration of Hadoop Eco System Components like Hive, HBase, Pig, etc.
o Right choice for free trial /
test demo purpose.
Other Popular Hadoop Distributions
Cloudera Hadoop
o Hadoop’s co-founder, Doug
Cutting, is its chief architect
o Cloudera is the Market leader in
the Hadoop space because it released the first commercial Hadoop distribution
o Highly active contributor of code
to the Hadoop ecosystem
o Provides Cloudera
Distribution for Hadoop (CDH) Parcels as well as powerful management and
monitoring tool, Cloudera Manager for Hadoop administration.
o Its approach is to take
components it deems to be mature and retrofit them into the existing
production-ready open source libraries that are included in its distribution.
o Formed in 2008 with its core
distribution based on 100% open source Apache Hadoop.
o CDH may be downloaded from
Cloudera’s website at no charge upto 50 data nodes large cluster, but with no
technical support nor Cloudera Manager.
Hortonworks
o Fast growing company and Started
in 2011.
o Another Major Player in Hadoop
market.
o Initially originated from Yahoo
and has the largest number of committers and code contributors for the
Hadoop ecosystem components.
o Releases Hortonworks Data
Platform (HDP), which includes Hadoop as well as related tooling and projects
o Hortonworks has collaboration
with major data management companies like Teradata, Microsoft,
Informatica, and SAS to provide integrated Hadoop solutions with their own
product sets.
o Uses Apache Ambari for
management, Stinger for queries, and Solr for searches.
Amazon Web Services Elastic MapReduce (AWS EMR) Hadoop
o Hosted Hadoop framework running
on the web-scale
infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple
Storage Service (Amazon S3).
infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple
Storage Service (Amazon S3).
o Provides Management Software and
GUI Support
o Provides enhanced Data protection
MapR Hadoop
o Provides complete distribution of
Apache Hadoop and related projects that’s independent of the Apache Software
Foundation.
o MapR is being promoted as the
only Hadoop distribution that provides full data protection, no single points
of failure, and significant ease-of-use advantages.
o It has customized underlying HDFS
into its own proprietary version MapRFS that is intended to improve efficient
management of data, reliability, and ease of use.
o Three MapR editions are
available: M3, M5, and M7.
o The M3 Edition is free and
available for unlimited production use;
o MapR M5 is an intermediate-level
subscription software offering;
o MapR M7 is a complete
distribution for Apache Hadoop and HBase that includes Pig, Hive, Sqoop, and
much more.
Pivotal Greenplum Hadoop
o Integrates EMC’s massively
parallel processing (MPP) database technology (formerly known as Greenplum, and
now known as HAWQ) with Apache Hadoop
o High-performance Hadoop
distribution with true SQL processing for Hadoop.
o SQL-based queries and other
business intelligence tools can be used to analyze data that is stored in HDFS
Intel Hadoop
o Provides excellent
performance with optimizations for Intel Xeon processors, Intel SSD storage,
and Intel 10GbE networking.
o Provides data security via
encryption and decryption in HDFS
o Supports role-based access
control with cell-level granularity in HBase.
o Improved Hive query performance.
o Support for statistical analysis
with open source statistical package R, and analytical graphics through
Intel Graph Builder.
IBM InfoSphere Big Insights
o Focus around value add on top of
the open source Hadoop stack
o BigInsights comes with a built in
browser-based spreadsheet tool called BigSheets
o Great support for Adaptive
Real-time Analytics and good text analytic capabilities by using the AQL
and JAQL.
Microsoft Hadoop on Windows Azure
o Microsoft HDInsight is
integration of Apache Hadoop version and Hortonworks Data Platform on Windows
Cloud Platform Azure
o Currently supports Pig,
Hive, and Sqoop
DataStax Hadoop
o DataStax Enterprise big data
platform consists of open source tools Apache Hadoop, Cassandra, Solr, Hive,
Pig, Mahout, etc.
o DSE is designed to manage
real-time, enterprise search data in the same database cluster.
o It also comes with OpsCenter
Enterprise, which allows for the management DSE Clusters via a central web
interface.
Apart from these, there are many
other hadoop distributions but all of these are open sourced under Apache’s GNU
Public License.