However, the big data market has always moved at lightning speed, and it’s had a strong element of open source from the very beginning. A large proportion of what the largest Hadoop distribution vendors deliver is based on open source. So, as this article looks to demonstrate, these vendors and those building complementary technology, will be best placed to take advantage of new big data trends like Spark Streaming, for example, and build solutions that bring real benefits to end customers.
Weighing up the Benefits
Traditional legacy and proprietary approaches to data integration still have their place in the burgeoning big data market. Vendors pursuing this approach often have solid products, reliable technology and well-funded development teams. However, their products are typically built on a traditional architecture which may not easily adapt to the big data environment. These products may work effectively for businesses that are doing things in the way they always have done. If you are an organisation that has relatively straightforward requirements around data integration, data quality and extract, transform and load (ETL) and you are intending to retain your existing processes and approach; there may be little need to move away from a proprietary model vendor that is entrenched in your information architecture.
The difficulty often comes when organisations want to launch new projects; make new initiatives or drive through a business transformation or product refresh. This moment brings significant concerns around cost and flexibility. If, for example, a customer, was looking for additional functionality for a new project, around say big data ingestion, the licensing approach favoured by legacy vendors is likely to be an issue. Businesses buying perpetual software will find it will necessitate a major upfront investment – and it can end up being a one-way investment. Once the purchase has been made, it can be hard to modify or partially cancel it when business needs change. That can have negative implications in terms of time, cost, and flexibility. In addition, traditional legacy architectures are often unwieldy, and it can be difficult for businesses to adapt them to effectively meet the demands of an evolving big data project or environment.
A flexible, licence-based open source environment offers a raft of benefits for businesses that want to stretch their wings in the dynamic new world of big data and feel that legacy approaches are unlikely to offer sufficient flexibility and agility to do so. With the subscription-price model favoured by open source vendors, organisations can reassess every year whether they want more, fewer or the same number of licences; or if they want to downsize or upsize. The flexibility and related cost saving that brings is very important to many businesses. Adding to the cost saving, organisations that base what they provide on open source will make further savings based on reduced R&D costs.
Moving Beyond the Cost Argument
As compelling as it is, the move to open source is being driven by far more than just the cost argument. The biggest driver today is a growing recognition of the benefits of a more collaborative, partnership approach to product development, especially when it comes to driving the innovation that is so critical to big data integration. In the legacy, proprietary world, you typically have one team of developers focused on building innovation into a next- generation product set. In the open source arena, the situation is different.
If you think about the latest high-impact big data Apache projects, for example, there are multiple organisations and individuals focused on the development of each one as well as the creation of new projects. This brings the benefit of innovation – by having, in effect, multiple development teams working on the best approach, which increases the pace of innovation and allows development teams to pick from the best of the developments.
Ultimately, with an open source approach, organisations not only receive higher quality innovation that is more fit for purpose, but they also receive it far more quickly than if they were pursuing a legacy proprietary approach. By extension, if the organisation is focused on developing an enterprise class solution, it will be far easier to ensure innovation lies at the heart of that product.
Such are the benefits it delivers that open source is rapidly becoming a standard approach in the big data arena. It is helping to drive innovative new technologies like Apache Spark and subsequently Spark Streaming, as well as helping to fuel emerging projects like Apache Beam. While it is relatively easy for open source vendors to support such projects, it often takes a major ‘crowbarring effort’ for legacy vendors to support them effectively. And, often by the time they do, the rest of the world has moved on.
It’s a clear example of why the benefits in favour of open source are so compelling. Just as the cloud has moved from disruptive force into the mainstream, the same process is now happening to open source and ever-growing numbers of businesses in the big data integration field, fully convinced of the benefits, are adopting an open source first approach.