Moving Big Data from the IBM mainframe to Hadoop

Syncsort has announced another milestone contribution to the Apache Hadoop ecosystem, incorporating powerful technology into the Apache Sqoop open source project that will allow Hadoop users to easily import and transform data coming from the IBM System z mainframe environment.

“Many organizations are looking to increase efficiency and save money by moving targeted mainframe data and workload processing to Hadoop,” said Charles Zedlewski, vice president, products, Cloudera. “Taken together, Apache Sqoop and Syncsort’s open source contributions will facilitate the importation and transformation of all types of mainframe data, allowing customers to take full advantage of Hadoop’s advanced analytical capabilities.”


As Hadoop has emerged as the dominant data processing platform for the enterprise, there is a growing need to rapidly move and transform mainframe data into an understandable next generation Big Data format. Syncsort’s contributions to Apache Sqoop will make it much more cost effective to store mainframe historical data in HDFS and will also help free-up mainframe CPU cycles by allowing customers to move expensive data processing workloads from the mainframe to Hadoop.
The new technology is now committed as SQOOP-1272, and supports loading multiple mainframe data sets to each of the nodes in a Hadoop cluster in parallel and transforming them into any Apache Sqoop supported file format. This makes it simple for organizations to integrate data from mainframe databases, such as DB2/z, IMS, Adabas, IDMS, and Datacom, with the rest of the data in a typical next-generation Big Data environment.


The contribution also features an open application programming interface (API) to allow anyone to extend support for more complex mainframe data files. Syncsort’s own award-winning DMX-h technology uses this open API, serving as a feature-rich add-on that can handle binary sequential data with COBOL copybook metadata and VSAM datasets. Syncsort’s DMX-h plug-in also allows seamless archiving of mainframe data to Hadoop, preserving its original mainframe record format.


“We will continue to be one of the most prolific contributors to the Apache Hadoop family of projects, adding open source technology that helps simplify and accelerate the process of offloading of legacy workloads and data into Hadoop,” said Tendu Yogurtcu, vice president, engineering, Syncsort. “This new open source contribution extends Apache Sqoop with the ability to move Partitioned Data Sets, such as IBM DB2 dump files, from z/OS on the mainframe to Hadoop and to store the data in any Apache-Sqoop supported format.”
 

First of its kind research, in partnership with Canalys, offers deep insights into some of the...
According to a recently published report from Dell’Oro Group, worldwide data center capex is...
Managed service providers (MSPs) are increasing their spending by as much as 70% to meet growing...
Coromatic, part of the E.ON group and the leading provider of robust critical infrastructure...
Datto’s Global State of the MSP: Trends and Forecasts for 2024 underscores the importance of...
Park Place Technologies has appointed Ian Anderson as Senior Director, Channel Sales, EMEA.
Node4 has passed the ISO 27017 and ISO 27018 audits, reinforcing its dedication to data security,...
Park Place Technologies has acquired Xuper Limited, an IT solutions provider based in Derby, UK.