“Hadoop is generating increasing interest for use in business-critical applications that rely upon low latency and consistency,” said Donnie Berkholz, an analyst with RedMonk. “This latest release from MapR targets these needs with improved support for real-time processing through the incorporation of new tools as well as new releases of existing ones.”
“Organizations can now maximize business impact and minimize risk by running operational applications along with real-time analytics on Hadoop,” said Tomer Shiran, vice president of product management, MapR Technologies. “MapR continues to lead the market by integrating the latest open source components into a mission critical, multi-tenant platform with self-healing high availability (HA), disaster recovery (DR), and data protection capabilities.”
Unbiased Open Source Approach
MapR provides users with maximum flexibility to pick the right tool for the job with an unbiased set of open source software on Hadoop. The latest MapR Distribution including Hadoop now contains:
· Multiple batch processing frameworks including MapReduce 1.x and 2.x (YARN-based), and Spark (0.9, 1.0.2).
· Five SQL-on-Hadoop technologies: Hive (0.11, 0.12, 0.13), Drill (0.5), SparkSQL (1.0.2), Impala (1.3.1) and certified integration with HP-Vertica
· Two NoSQL technologies: HBase (0.94.21, 0.98.4), MapR-DB
· Three machine-learning and graph libraries: Mahout (0.8, 0.9), MLLib (0.9, 1.0.2), GraphX
The MapR Difference
A single MapR cluster can run thousands of disparate jobs a day and accommodate users and departments with different needs. MapR offers several unique operational features including:
· Backward compatibility across versions of open source software packages and ability to upgrade at an organization’s own pace
· Heterogeneous processing of MapReduce 1.x and 2.x (YARN-based) applications running on the same set of nodes
· Advanced multi-tenancy capability to isolate and protect departmental data on specific nodes and the flexibility to schedule jobs only to those nodes via the label-based scheduling feature, which is now available for YARN jobs as well
· Fine-grained resource management to enable consistent high-performance applications supported by unique resource management of disk I/O along with memory and CPU, when deployed via YARN
· Comprehensive security through its easily deployable end-to-end, wire-level security that now includes all YARN applications as well
· A no-NameNode architecture which enables self-healing clusters that can handle multiple simultaneous failures as well as an unlimited number of files and tables
· Snapshots that provide point-in-time consistency enabling system-of-record data to be stored on Hadoop
· Business continuity with immediate remote-site recovery for batch and real-time applications
· NFS support that enables true random read-write capability so applications can write to and update data on Hadoop even while the analysis is in progress