5

I have a list of captions with a large number of near-duplicates. For example:

  • Birthday for Her
  • For Her Birthday
  • Birthday - For Her
  • For Her / Birthday

I was looking into Fuzzy Lookup as a way of highlighting these near-duplicates

Tim
  • 53
  • 1
  • 1
  • 4

1 Answers1

3

I was looking into Fuzzy Lookup as a way of highlighting these near-duplicates

The Fuzzy Lookup Add-In for Excel performs fuzzy matching of textual data in Excel.


Fuzzy Lookup Add-In for Excel

The Fuzzy Lookup Add-In for Excel was developed by Microsoft Research and performs fuzzy matching of textual data in Microsoft Excel.

It can be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching is robust to a wide variety of errors including spelling mistakes, abbreviations, synonyms and added/missing data.

For instance, it might detect that the rows “Mr. Andrew Hill”, “Hill, Andrew R.” and “Andy Hill” all refer to the same underlying entity, returning a similarity score along with each match.

While the default configuration works well for a wide variety of textual data, such as product names or customer addresses, the matching may also be customized for specific domains or languages.

Source Fuzzy Lookup Add-In for Excel


Any suggestions on the Similarity Threshold configuration?

Performing Fuzzy Lookups in Excel has some hints on Similarity Threshold configuration.

DavidPostill
  • 153,128
  • 77
  • 353
  • 394
  • My table of captions is a single column in alphabetical order. So I want to compare the table to itself to find the near duplicates. Most of the examples I have seen online use two different tables. is there an example on how to configure the Lookup to compare a single table to itself? – Tim Jun 11 '15 at 18:41
  • Not that I know know of. Have you tried duplicating the table in a second column and then comparing the original to the duplicate? – DavidPostill Jun 11 '15 at 18:43
  • that may be the trick! I will try that and let you know. Any suggestions on the Similarity Threshold configuration? Thanks! – Tim Jun 12 '15 at 13:05
  • Searching for "Fuzzy Lookup Add-In for Excel examples" gives some links you could investigate. – DavidPostill Jun 12 '15 at 13:28
  • http://www.k2e.com/tech-update/tips/431-tip-fuzzy-lookups-in-excel has some hints on Similarity Threshold configuration. – DavidPostill Jun 12 '15 at 13:31