This release introduces a design approach that will support multiple compute frameworks, de-coupling the user experience from the underlying Hadoop processing paradigms, including MapReduce, Apache Spark, and Apache Tez. The unique architectural approach will “future-proof” the process of collecting, blending, transforming and distributing data, providing a consistent user experience while still taking advantage of the powerful native performance of the rapidly evolving compute frameworks that run on Hadoop.
“Our research shows that the most common workloads being shifted to Hadoop are large-scale data transformations,” said Jeff Kelly, Principal Research Contributor, the Wikibon Project. “Syncsort continues to make waves in the Big Data ecosystem by innovating easier, more effective ways to create these transformations in Hadoop and to move expensive enterprise data warehouse and mainframe workloads across emerging Hadoop frameworks.”
The new release includes a new “Intelligent Execution Layer” that allows users to visually design data transformations once and then run them anywhere – across Hadoop, Linux, Windows, or Unix, on premise or the cloud – while maintaining the performance of a native implementation. The architecture includes dozens of special-purpose algorithms and an advanced optimizer that automatically selects the ideal execution path to process jobs based on the underlying compute frameworks available, the characteristics of the data set, and Hadoop cluster conditions.
“Many organizations are looking to liberate their legacy data and budgets by shifting workloads from enterprise data warehouses and mainframes into Hadoop, but they are challenged by the complexities of the rapidly-improving Hadoop stack,” said Tendu Yogurtcu, General Manager of Syncsort’s Big Data business. “This new release makes it extremely easy to shift sophisticated data flows to Hadoop and to create new data transformations, taking advantage of the most powerful Hadoop-based compute paradigms as they evolve.”
Highlights of the new release include the ability for users to:
· Avoid application obsolescence by deploying and running the same highly efficient data flows on or off of Hadoop, on-premise or in the cloud
· Isolate the transformation logic from the underlying complexities of Hadoop using a new Intelligent Execution Layer that will allow Syncsort to deliver native support for multiple compute frameworks such as Apache Spark and Tez
· Leverage best-in-class, one-step data ingestion capabilities for Hadoop – ingesting data directly into Big Data formats such as Avro and Parquet without the need for staging
· Load Apache Spark engines with legacy mainframe data sets, including VSAM and binary sequential files with COBOL copybook metadata – all via a new, Cloudera certified, Apache Spark mainframe connector from Syncsort
· Support governance initiatives with advanced metadata management and data lineage, including HCatalog support
· Utilize best-in-class data discovery, analysis and visualization with the new Syncsort QlikView QVX Connector and the Tableau Data Extract Connector
· Achieve high performance parallel data loads to Hive, Vertica, and Greenplum
· Support NoSQL data stores such as Apache Cassandra, HBase and MongoDB
· Monitor and manage Apache Hadoop transformations with customized dashboards based on operational metadata and RESTful APIs shipped in Docker containers