One of the big challenges for today’s decision makers is the flood of big data – huge volumes of both structured and unstructured information, with various data types/formats and frequent data updates – and how to capture and utilise this information effectively and in a timely manner.
This will require changes in many corporate data warehousing strategies. In the last couple of decades IT groups have pursued the development of a single data warehouse that serves as the central repository and single source of truth for all of the (structured) data within their organizations.
Now this approach is being challenged by the meteoric increase in social media posts and a surge in non-transactional data from sources such as application and Web server logs, network monitoring devices and sensors.
The big data issues are more acutely felt in certain industries such as marketing/advertising, telecommunications, retail and financial services, and certain government activities. Understanding the relationships between data is important in areas as diverse as fraud detection, counter-terrorism, asset maintenance, energy metering, and marketing campaigns.
Those organisations that see big data as an opportunity, and not so much a technology but a business strategy for capitalising on information assets, will have a considerable advantage over those that lag.
What needs to change then, is the concept of a single Relational Database Management System (RDBMS) as the only Enterprise Data Warehouse (EDW). Previously the lack of integration between big data systems and existing business intelligence and data warehousing tools was one of the technical challenges that IT groups were facing.
Today, new technologies such as IBM’s Big Data platform makes sourcing, merging, developing and managing big data easier. The platform also offers high availability and fault tolerance capabilities. Technology has now evolved so that big data and the data warehouse can behave and act as though they are the same cohesive data set.
As always with technology, it’s crucial to know exactly what it is you’re looking for and how to distil it. It may be that discrete snapshot, or it may be that pattern. There are tools within IBM’s big data platform that lets you integrate various types of data sources and perform a data discovery using a spreadsheet-based interface.
For example, departmental users in an organisation could use Hadoop to sift through Web data in an effort to find information that’s relevant to a particular business problem, then move that subset of data to an analytical database for more heavy-duty analysis. Once the analytical processing was complete, the aggregated results could be rolled up into a data warehouse and made available to a wider group of users.
At Tridant we have done something similar for a large Retail organisation that uses Google Analytics to source web traffic and ‘click-through’ metrics, as well as sourcing Facebook and Instagram metrics, to merge into the data warehouse for the creation of dashboards for the marketing team.
So the data warehouse lives on to drive best practices in delivering trusted data to the business but now augmented by multiple data processing technologies in a well-coordinated architecture to deliver a cohesive set of data (from disparate data) to the business.
Last words from Mark Beyer at Gartner Inc. in Stamford:
“The EDW is not going away — in fact, the enterprise data warehouse itself was always a vision and never a fact. Now the vision of the EDW is evolving to include all the information assets in the organization. It’s changing from a repository strategy into an information services platform strategy.”