advanced search

Welcome: Guest

log in

Data Quality & Integrity

Publication Date November 2004
Publisher Butler Group
Product Type Report
Pages 176
ISBN Number not applicable
Product Code BUT00015
Price

£795.00
approximately: $1,185 | €947

PDFBuy Now
PRINT £845 ($1,260 | €1,007)Buy Now
Order above formats by FAXOrder by FAX

Summary

Businesses have failed, and will continue to do so, because they neglect to take seriously the management and control of data quality and integrity issues. Whilst outright business failure may seem extreme, the billions of Euros squandered by large organisations on resolving problems fundamentally attributable to the quality (or lack of) of their data is an undisputable fact. Such costs present themselves in a myriad of ways, such as lost revenue stemming from customer discontent and a failure to recognise new opportunities, wasted resources including poor stock control, inventory management, shipping wrong goods, and the knock-on costs associated with instituting remedial action and aftersales service. This is not to mention the costs of data quality on decision-making, planning, and strategy formulation, which although difficult to measure, could clearly dwarf the operational impact.

Data quality and integrity issues cannot be adequately (or even satisfactorily) managed on a retrospective basis. Too many businesses are caught-up in a cycle of managing the downstream impact of data quality with disproportionate resources compared to implementing a proactive and ongoing data quality strategy. Instead of throwing good money after bad in order to patch up problems as and when they arise, opportunities should be aggressively sought, targeted, and taken in order to institutionalise good data quality practices, in doing so reaping significant downstream efficiencies. Data quality needs to be driven as far upstream as is practically possible. The message here is simple - do no invest time and money trying to make spontaneous improvements; capture the problem and rectify it at source. In this regard, data quality needs to start before the actual data exists, in the form of rules, policies, constraints, and continual monitoring.

The implications and costs of poor data do not have to be endured - the issue should not be viewed as a costly exercise required, say, to mollify auditors. We firmly believe that investing in data quality can deliver a significant positive ROI, making organisations more effective with regards to decision-making and analysis, and dramatically reducing the costs associated with remedial activities.

Business Issues

The problem of data quality is not new. It is just that data is now being exposed and exploited at a much more strategic level - for example, through Business Intelligence solutions to help formulate and execute business strategy. In addition, the problem is compounded by the fact that more users are now involved in data analysis and manipulation. We have seen increased investment in front-end tools often with a blatant disregard for associating this investment with attention to data quality. This is folly - investing in BI and/or CPM technologies without first addressing data quality is not just a waste of money, it could actually do more harm than good.

In addition, compliance and regulatory issues demand that businesses improve process transparency, and ensure that information published is entirely accurate with a clear, demonstrable audit trail relating to the source of the data and calculations performed on it. Across a raft of sectors, organisations are closely evaluating their regulatory responsibilities. A common theme that exists over multiple regulations is the need to increase accountability and demonstrate fully auditable and transparent processes. It is therefore necessary to view compliance as an ongoing aspect of a well-governed business operation, which in turns means that data accuracy and validity need to be continually managed.

In this era already characterised by aggressive litigation and tough compliance demands, it should therefore not be a case of struggling to justify an investment in data quality. It needs to be thought of as a business mandate, and opportunities to get one's house in order need to be sought now before the risks and implications become too great.

We are clearly of the opinion that a properly implemented data quality strategy will not only help the organisation achieve its regulatory requirements or service level agreements, but that it will deliver many additional benefits and ultimately a positive, long-term ROI. In particular, the following need to be seen as opportunities for improvement:

Data management: Isolated (yet potentially significant) savings from improved data management including streamlined and efficient transformation processes geared by real business drivers.
Reduced operational costs: These are typically those previously outlined, including reduced wastage and improved customer service.
More effective decision-making: The benefits of improved decision-making can be significant and long-term in nature. It is not only a question of improving those micro decisions made on a day-to-day basis; a bedrock of properly managed, quality data improves the process of strategy formulation and execution.
There is an incorrect view within organisations that data quality problems can be addressed on a case-by-case basis - that dirty or inaccurate data should be isolated and 'made good'. This approach is fundamentally flawed. Data quality has to be seen as a proactive, business-oriented strategy, not a reactive discipline of hopping from problem to problem addressing practical issues. Organisations therefore need to examine the processes by which data is created, transformed, and used, not just consider its final form. It is only by adopting this process view that we believe the business can hope to gain full control over its data, helping to deliver accurate and reliable information to users, customers, partners, and other important stakeholders, and develop streamlined, optimised processes that eliminate unnecessary wastage.

Technological Issues

In the drive for real-time information and operational agility, technologies and processes are being pushed to, and indeed beyond, their limits. Not only do organisations have to consider changes to the business environment, but they are also coming under significant internal pressure to reduce cycle times and move towards real-time information delivery. As a consequence, escalating demands are being made on various repositories (data warehouses, On-Line Analytical Processing (OLAP) cubes, etc.) and the network infrastructure charged with handling the movement of data, to the point where they become the limiting factor in the organisation's ability to respond.

A typical reaction to data quality problems is to blame IT. This is not helpful. Data quality needs to be seen as a business problem, not an IT problem. However, addressing this 'business problem' will doubtless require assistance in the form of technology solutions, combining dedicated software and hardware.

Technologies involved in the management of the data lifecycle include data analysis and cleansing, data Extraction, Transformation, and Load (ETL), transactional and relational repositories, and to a certain extent, OLAP and BI analysis techniques.

Data cleansing software undoubtedly plays an important role, however, it is vital that data quality is viewed as an ongoing process, not a series of 'pot shots'. Our observation is that many organisations rely too heavily on the ongoing use of data cleansing tools. It is extremely important to acknowledge where the problems have been identified and rectify them at the source, not to use software to continually patch over the same underlying problem.

The use of ETL tools and data warehouses has blossomed over the past decade as organisations sought the opportunity to perform wonderfully meaningful mappings between disparate application repositories. Unfortunately, most of these attempts were just that; good efforts at resolving what was, and still remains, a complex problem. Suffice to say that many organisations went through the painful process of transforming and mapping their data and loading it into a suitable and expensive warehouse, only to find that their quest for valuable information had fallen a long way short.

ETL and end reporting require as much attention as the database design. It is a chain, but actually the ETL and reports are more important to get right: if the data warehouse design is wrong it will slow the system, if the ETL is wrong it will be 'garbage in', and if the data mining analysis and reporting is missing, inadequate, or unused (or more likely the right users are not aware of it), then ROI is zero.

One lesson the industry has learnt is that data source specifications will inevitably change as new sources are added and existing sources redefined. Keeping up with these changes is too time consuming for hard-coded ETL solutions. An important benefit with commercial products is the in-built metadata management of sources, targets, and transformations. This allows metadata to be made consistent within the organisation as a whole and prevents the data warehouse from becoming an isolated function within a department (the 'stove-pipe' syndrome), providing better integration with other departments.

Data warehouse technology has matured over the past decade to the point where users are now faced with such a bewildering choice of implementation approaches and such an impenetrable thicket of jargon that it is difficult to decide the right way to tackle data warehousing applications. The latest industry buzz centres on Enterprise Information Integration (EII) as a mechanism for rapidly accessing data for analysis and reporting purposes. There is nothing new in this - the concept of the virtual data warehouse has been around for years, and EII should be treated with the exact same caution.

At the most fundamental level, that of data, the organisation has to understand the limits, restrictions, and possibilities, of what can be achieved. In this regard, there is still a generally poor appreciation of basic data principles.

Metadata modelling and a metadata architecture are important considerations in controlling data and hence data quality. Managing metadata within a data warehouse environment is clearly defined by the environmental constraints. The concept of managing metadata outside of a data warehouse environment has the same advantages and follows the same principles as metadata management within such an environment. Given the nature of metadata itself, there is no reason why such tools cannot access an implemented metadata layer outside of a 'typical' data warehouse implementation.

Market Issues

The broad market for data quality software and solutions is therefore made up of a range of different vendors and technologies from the following areas:

  • Data cleansing.
  • ETL.
  • Data warehousing.
  • OLAP and BI.

Depending on their heritage, each market segment is likely to have a slightly different view on data quality and naturally claims theirs to be the most critical. In reality, no single area should be thought of as being more important or critical than another - the fact of the matter is that it will depend entirely on the needs and considerations of each organisation.

From a BI perspective, a significant portion of the vendor market has to be criticised for their lack of attention to data quality in the past. It seems that this market was only too happy to sell customers software to allow them to 'make more effective decisions' without really taking any responsibility for the quality of the data. Data quality is an imperative precursor to the appropriate use of front-end BI tools. As a consequence, we are seeing examples of many organisations reviewing their BI strategy with a view to consolidating the number of vendors involved, migrate to a centrally-controlled yet widely-deployed Web-based architecture, and focus on the creation and maintenance of data quality for reporting and analysis purposes.

Providers of ETL software have generally found trading quite tough of late. BI vendors are, by and large, moving to incorporate ETL functionality within their portfolio and the main database vendors are developing their own capabilities. However, the demand for data integration and management solutions will never go away - so fragmented and isolated are the architectures and IT infrastructures of modern business that we will always need some kind of 'glue' to bring all the pieces together.

Data cleansing and data analysis form an important part of the data quality picture - indeed when you mention data quality, most people would automatically think of the analysis of large data sets, highlighting of anomalies, and subsequent cleansing. However, this is just part of the process; it is vital that the issues and procedural irregularities thrown up by the analysis and cleansing process are retrospectively embedded into the data lifecycle, for example by amending data validation rules, or the reworking of an ETL process. In this regard, vendors of data analysis and data cleansing tools need to find points of integration with the other technologies used to manage and move data.

No single force or driver currently dominates the market for data quality solutions; the effect is more subtle and cumulative than that. However, this spectrum of drivers weave together to create an almost inescapable shroud - most organisations will feel the effect of at least one of these issues, whilst countless others will be wrestling with several issues simultaneously. Specific market drivers helping to promote or isolate the importance of data quality include:

Accountability: Forget for a second the billions of pounds that must be lost each year through the operational implications of bad data, in stock overruns or shortfalls, accounting slip-ups, and remedial customer service. Institutions are now expected to have and use the right data and to demonstrate that management is in control of every process. We feel that this level of accountability is both appropriate and required to deliver data quality-related savings and help the business achieve its broader governance goals.
Compliance and legislation: Responsible organisations should look to develop best practices for achieving and sustaining an appropriate level of data quality, as outlined above. Organisations flouting their regulatory requirements face fines, public humiliation at the hands of the media, and even prison sentences for executives in extreme situations. The issue of compliance is therefore forcing many organisations to consider the impact of data quality on their ability to comply.
Data for ROI, BI, and CPM: The basic premise of data quality and its relation to BI and CPM is simple. No single piece of data or information should be surfaced via such a solution unless it is accurate and reliable. The simple fact of the matter is that organisations have focused their efforts on investment in front-end data analysis and reporting solutions, making the most of the latest Web-based architectures to supplant their limited client-server implementations, to the detriment of back-end data foundations. The result is inaccurate information, leading to flawed analyses and decisions, and ultimately skewed execution.
Supply Chain Management (SCM): For businesses to buy and sell to other businesses effectively, there needs to be close integration between people and systems. The nirvana is highly automated, inter-business processes, with dynamic and effective sharing of data and information throughout the supply chain. As a consequence, data quality is playing a pivotal role in SCM deployments.
Mergers and acquisitions : Examples where a merger or acquisition has been held up or undermined by the inability of the two businesses to co-ordinate and integrate their disparate working practices, technologies, and most fundamentally, their data, are all too numerous. The joint organisation quickly needs to be seen as a single entity, both from external and internal perspectives, and this naturally requires shared understanding of data definitions and data quality policies.

Key Findings

  • Data integrity and data quality are not the same. Whilst quality concerns the accuracy, currency, and precision of specific data, integrity is related to how data maintains its conformity to rules and constraints over time.
  • Businesses have failed, and will continue to do so, because they neglect to take seriously the management and control of data quality and integrity issues.
  • The impact of data quality on decision-making, planning, and strategy formulation, although difficult to measure, could clearly dwarf any direct operational impact.
  • Data quality and integrity issues cannot be adequately (or even satisfactorily) managed on a retrospective basis.
  • Too many businesses are caught-up in a cycle of managing the downstream impact of data quality with disproportionate resources compared to implementing a proactive, ongoing data quality strategy. In short, data quality has to start before the physical data actually exists: prevention is better than the cure.
  • Compliance and regulatory issues demand that businesses improve process transparency, and ensure that information published is entirely accurate with a clear, demonstrable audit trail relating to the source of the data and calculations performed on it.
  • The era of batch data warehouse updates is over. The pressure of 24x7 operations and the customer demand for real time information access cannot be adequately supported by a batch approach.
  • The basis premise of data quality and its relation to Business Intelligence (BI) and Corporate Performance Management (CPM) is simple. No single piece of data or information should be surfaced via such a solution unless it is accurate and reliable.
  • We firmly believe that investing in data quality can deliver a significant positive Return On Investment, making organisations more effective with regards to decision-making and analysis, and dramatically reducing the costs associated with remedial activities.
  • A typical response to data quality problems is to blame the IT department. Not only is this not helpful, it is not actually correct. Data quality needs to be seen as a business problem, not an IT problem.
  • If we were fast-forwarded ten years, today's levels of lackadaisical reporting, accounting, and publication of data would be laughed at, so ingrained in reputable business and governance will data quality become.

Content

  • Section 1: Management Summary
    • 1.1 Management Summary
  • Section 2: Business Issues
    • 2.1 Report Structure
    • 2.2 The Size of the Problem
    • 2.3 Data Quality And Integrity - Ignore at Your Peril!
    • 2.4 Data Integrity and Integration
    • 2.5 The Role of Metadata
    • 2.6 Conclusions
  • Section 3: The Data Lifecycle - From OLTP To OLAP And Beyond
    • 3.1 Introducing The Data Lifecycle
    • 3.2 Technologies Used in Data Quality and Integrity
    • 3.3 Managing Metadata
    • 3.4 Butler Group Model Of Data Quality And Integrity
    • 3.5 Conclusions
  • Section 4: Developing a DQI Strategy
    • 4.1 Why Organisations Need a Strategy for Data Quality
    • 4.2 Components of a Data Quality Strategy
    • 4.3 Cultural Considerations
    • 4.4 Technology Considerations
    • 4.5 Butler Group Deployment Roadmap
    • 4.6 Conclusions
  • Section 5: Market Evaluation
    • 5.1 Market Overview
    • 5.2 Segmenting the Market
    • 5.3 Market Drivers
    • 5.4 Market Trends and Future Developments
    • 5.5 Conclusions
  • Section 6: Product Profiles
    • 6.1 Introduction
    • 6.2 Extract, Transform, and Load (ETL)
    • 6.3 Data Cleansing
    • 6.4 Data Warehousing
    • 6.5 On-Line Analytical Processing (OLAP)
    • 6.6 Selected Case Studies
  • Section 7: Glossary