Friday, December 4, 2009

Access to Critical Business Intelligence: Challenging Data Warehouses?

For a long time, data warehousing used to be synonymous with business intelligence (BI), to the extent that there is a deep ingrained belief that BI cannot be conducted without a data warehouse (DW). Indeed, when companies are dealing with a deluge of data, it helps to have a DW, since it offers large corporations the ability to leverage information assets to support enterprise reporting and analysis. DWs also provide a technical solution to the problem of multiple systems, separate data stores, and rapidly expanding historical data, since information is extracted from various transaction-based systems, such as spreadsheet, enterprise resource planning ( ERP), supply chain management (SCM), or customer resource management (CRM) systems, and stored in a central repository where it is transformed, cleaned, and consolidated.

During the 1990s, this model grew to form the basis for an entire data warehousing industry, with supporting hardware, software, and consulting vendors (see The Necessity of Data Warehousing).

Many data warehousing proponents also believe that a transaction database cannot support the concurrent demands of both enterprise systems and BI applications, since the argument is that business users querying the database will decimate the performance of the entire transactional system. However, a less known detail might be that the major database vendors have meanwhile created relational database management systems (RDBMS) that are entirely capable of supporting both functions, but most of the enterprise applications and BI vendors continue to use only a small fraction of the total functionality available in the source database management systems (DBMS). Namely, their product developers tend to focus rather on cross-platform design, causing them to leverage only the limited amount of functionality that the major databases have in common.

Conversely, rather than duplicating these tools, Vanguard tries to optimally leverage them as part of the overall solution by introducing Direct Access, the technological enterprise information integration (EII)-like foundation of its Graphical Performance Series (GPS) BI solution. This solution delivers integrated enterprise information directly to business decision-makers without relying on a DW, thereby potentially saving time, increasing business agility, and reducing costs. The Vanguard GPS solution has since been able to directly access the information stored in enterprise systems, without requiring businesses to move or stage data, or invest in complex and unwieldy data warehousing technology.

Businesses are collecting an ever-expanding amount of data in multiple systems, formats, and locations, making it increasingly difficult to maintain and synchronize a redundant copy of the original data in a central repository. As the complexity of this process increases, the administrative costs go up, and the business value decreases. At the same time, the highly competitive nature of the relational database market drives continuous improvements in the functionality of the source databases.

As the processing power and functionality of the databases improve, the most logical approach might then be to size the database server to manage both the transactional and reporting workloads. Vanguard believes that it is essential to leverage effectively all of the available database capabilities, so that the GPS solution includes native data access for each database and takes full advantage of the distinct features of each. The vendor's experience has reportedly shown that properly sizing the database server to support transactional and reporting functions can be far less expensive, while adequately effective, than building and maintaining a redundant DW. Further, the IT management costs of tuning the transaction database are likely less than the cost of the ongoing maintenance required by a DW, which becomes a mission on its own, to a degree that the enterprises even forgot the original purpose of the DW.
When the DW model was conceived, it was believed that it was necessary to stage data in a warehouse to provide reasonable query performance for users. However, this is a case where query performance can be drastically improved simply by taking advantage of the features in the source databases, since Oracle, IBM, and Microsoft now offer declarative summarization capabilities that automate the process of creating and maintaining summary tables. Called materialized views, materialized query tables, or indexed views, these database features, combined with Vanguard's "summary-aware" metadata feature, can enable users to specify and deploy declarative summarization.

Vanguard makes use of this capability to limit the processing impact of user queries and maximize the performance of the solution. By incorporating declarative summarization directly into their database engines, the major database vendors might have removed, in some instances, a primary justification for data warehousing. The database heavyweights Oracle, IBM, and Microsoft are also devising ways to leverage XML and enable multi-format integration with data other than their own.

In fact, Oracle's BI solution, although featuring data warehousing and extract/transport/load (ETL) capabilities, often tends to avoid unnecessary data consolidation, given that historically, companies would dump all the information into a DW. They believed that if they put all of their data into a single database/DW they could actually eliminate the problem of functional silos and everybody would share information and be working off the same playbook.

Yet, the actual findings are too often that when a company builds a DW, the focus becomes one of getting the data from the source systems into the warehouse, whereas users are still left to their own devices to develop reports or write queries, which is a time consuming process. Thus, the right way to do it might be to keep management reporting in the same system with the transactions themselves, so that they bring along with them all the context of why the transaction happened. That approach enables users to drill into the information and explore the metrics/key performance indicators (KPI) all the way down to the very transaction that may have caused the problem. Additionally, by leaving everything in the same system or in the source instance where the transaction occurred, the users can actually drill through a transaction and fix the error on the spot if something is wrong.

Furthermore, as the Sarbanes-Oxley (SOX) and other reporting requirements hold businesses to higher standards of accuracy and accountability, business transparency is vital, but a DW, by definition, is a copy of the original data records. DWs are constantly being shifted, updated, or consolidated, and they combine data from multiple systems, each with different business rules. On the contrary, Vanguard's Direct Access pulls information directly from the transaction systems—the real systems of record—so that the question of what "version" of the data is reported becomes obsolete.

The issue is also whether there is a point of integrating a large amount of data in a DW when one only needs a small sliver of it. Given that the nature of corporate information is dynamic, trying to keep it replicated and synched in multiple databases when, for example, merged with another entity, is impractical, especially if data is accessed infrequently. Changes to a traditional DW model to bring in new data can take months, whereas EII-like data federating solutions are not as fragile as the procedural ETL scripts and can accommodate the necessary changes much more quickly.

No comments:

Post a Comment