We would like to keep you up to date with the latest news from Digitalisation World by sending you push notifications.
Gartner’s 2020 research report, ‘Data Hubs, Data Lakes and Data Warehouses: How They Are Different and Why They Are Better Together’, by analysts Ted Friedman and Nick Heudecker, is equal parts informative paper and warning flag. Based on incoming technology inquiries made to the firm over the last few years, the report makes it clear that serious knowledge gaps currently exist when it comes to what these three different data structures do and how businesses should be deploying them. “For example, while Gartner client inquiries referring to data hubs increased by 20% from 2018 through 2019, more than 25% of these inquiries were actually about data lake concepts.” (Data Hubs, Data Lakes and Data Warehouses: How They Are Different and Why They Are Better Together by Ted Friedman, Nick Heudecker.)
With so many businesses already using all three of these data structures, it’s clear that their ability to properly understand and harness the unique benefits each of them offers will play a major role in overall operational success. However, Gartner’s research shows that significant investments are currently being made by executives and IT teams who don’t yet fully understand what the different structures do, or how best to combine them to achieve the most effective results.
The correct data entity is dictated by the specific business need
Each of the three data entities can be used to achieve specific business needs and objectives. Data Warehouses are used for the analysis of structured data, Data Lakes for analysis of unstructured or semi-structured data, and Data Hubs for communicating the resultant BI to employees and individuals who need it. However, many businesses wrongly believe they can all do the same things – just in different ways – thereby making them interchangeable. To vanquish this myth, it’s crucial that business leaders not only understand each entity's correct usage themselves but also educate other key stakeholders within the organisation to properly democratise the use of data.
For instance, Data Lakes and the exploratory technologies that unstructured big data enables are only as useful as a business’s ability to integrate findings into a properly structured data environment. At this point, a Data Warehouse takes over, with the Data Lake connected to it and becoming one of its many sources. Data can then be blended with a range of other sources in real-time to deliver rich business insights that would previously have gone unseen.
Ironically, the entity that is perhaps the most important of all, tends to be the least understood by data managers. The Data Hub is where BI is not only shared but is also available for governance by those responsible for it. As its name suggests a hub also ‘enables data flow between diverse endpoints’.
A key recommendation from Gartner’s report is for businesses to maximise their ability to support a broader range of diverse use cases by identifying the ways that these different structures can be used in combination. “Maximize your ability to support a broader range of diverse use cases by identifying the ways that these structures can be used in combination. For example, data can be delivered to analytic structures (Data Warehouses and Data Lakes) using a Data Hub as a point of mediation and governance.” (Data Hubs, Data Lakes and Data Warehouses: How They Are Different and Why They Are Better Together by Ted Friedman, Nick Heudecker.)
Agile businesses can adapt faster
Another key takeaway from Gartner’s report is the need for businesses to be agile when it comes to ingesting and analysing data from a range of separate sources, in multiple different formats. Those that can do so will be much better positioned to adapt to sector disruption, and therefore able to monetise exciting new technology before competitors can. This notion supports the idea of using both a Data Warehouse and Data Lake in conjunction, as part of a logical Data Warehouse. It also reinforces the need for end-to-end automated infrastructure, to help simplify management and make rapid changes as required.
However, it’s important to remember that data is an ever-evolving beast. As data volumes continue to grow at exponential rates year-on-year, the infrastructure needed to effectively store and analyse everything is becoming more and more complex. New data sources regularly emerge, and data demands are constantly shifting over time. Consequently, there’s no such thing as an ‘ultimate data infrastructure’ that won’t ever need to be overhauled or upgraded. Businesses of all shapes and sizes need to recognise this, ensure they take the time to look at their infrastructure regularly and identify if/where changes or improvements need to be made.
Better understanding will lead to higher success rates
The purpose of this article is to help executives, IT teams and data leaders eliminate any uncertainty and built the most effective data infrastructure possible for their business. Misunderstanding in areas such as this can have major implications when it comes to expectations vs reality, particularly if those leading the business’s data department have different definitions of certain infrastructure compared to the teams/individuals in charge of building and using it on a day-to-day basis.
Gartner’s report makes for extremely interesting reading and should be considered essential for anyone who has even the slightest doubt or confusion over the specific roles and purposes of Data Lakes, Data Warehouses and Data Hubs. Understanding these can be the difference between building data infrastructure that underpins the business for years to come or a failed project that quickly proves costly for everyone involved.
Author bio Simon Spring, Account Director EMEA, at WhereScape
Simon joined WhereScape nearly ten years ago and throughout this time has worked effectively with hundreds of organisations looking to utilise data analytics and data warehouse automation to transform their business.