Programming Languages White Papers
Effectively Mining and Using Coverage and Overlap Statistics for Data Integration
Overview Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition there are no effective approaches for learning the needed statistics. The key challenge in learning such statistics is keeping the number of needed statistics low enough to have the storage and learning costs manageable. Naive approaches can become infeasible very quickly. This paper presents a set of connected techniques that estimate the coverage and overlap statistics while keeping the needed statistics tightly under control. The approach uses a hierarchical classification of the queries, and threshold based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics.
| Publisher | Microsoft | File Format | |
|---|---|---|---|
| Date Published | October 2004 | ||
| Format | White Papers | ||
| Topics | |||



