A data processing system cleanses and transforms point-of-sale data gathered daily from hundreds of retail stores. The data processing system is being replaced by newer, more robust technology so that the point-of-sale data can be gathered more reliably from more stores and delivered to more destinations.
A quality assurance team has been charged with testing whether the output of the new system matches the output of old system. Any discrepancies must be fixed, or in cases where the new technology has improved the data quality, the differences must be documented and explained to the end-users of the data. The processing systems generate over 200MB of information (30,000 records on average) at each of five stages of processing. The QA team has no automated means of comparing the data output of each stage. Each test run takes two weeks, and as a result, the system release date is slipping.
I was hired to help the QA team automate the analysis of the data output.
|