Introducing probabilistic record linkage algorithms
In addition to deterministic algorithms like the LSH seen above, there are probabilistic record linkage algorithms that use probabilistic techniques to determine the probability that two records refer to the same entity. In fact, they are specifically designed for fuzzy matching or linking records from different datasets. They typically involve calculating similarities between pairs of records based on various attributes (e.g., names, addresses, etc.) and assigning weights or probabilities to these similarities. Such algorithms then use these probabilities to make a decision about whether two records match or not.
One of the best-known algorithms of this type is the Fellegi-Sunter model proposed by Peter J. Fellegi and Alan B. Sunter in 1969. It has become a fundamental approach to probabilistic record linkage. The algorithm typically involves several steps:
- Comparison of attributes: The algorithm compares the values...