White Papers

Assessing Deduplication and Data Linkage Quality: What to Measure?

Overview Deduplicating one data set or linking several data sets is increasingly important tasks in the data preparation steps of many data mining projects. The aim of such linkages is to match all records relating to the same entity. This paper presents an overview of the issues involved in measuring deduplication and data linkage quality, and it is shown that measures in the space of record pair comparisons can produce deceptive accuracy results. Various measures are discussed and recommendations are given on how to assess deduplication and data linkage quality.

Further White Paper Details
PublisherAustralian National University File FormatPDF
Date PublishedJanuary 2007
FormatWhite Papers   
Topics
    N/A

Quick Sitemap Links: