Similarity algorithms

I have recently been researching a record linkage techniques, and part of this process I have been reminding myself of certain algorithms, and in other case learning these for the first time. As is my way, I typically try and turn the algorithm into code to allow me to understand and learn it. I have coded up examples of the following algorithms in Java:

Levenshtein
Damerau-Levenshtein
Hamming
Jaro Winkler
N-Gram
Markov Chain

Out of curiosity I decided to publish each one separately so I can see from the stats which is the most popular. In a month or so I will add a histogram of the hits for each one (if there are any!).

I hope this is useful.

Be warned, I offer no warranty or guarantee on this code, any changes / enhancements / corrections / or use of this code should be attributed to itssmee, CHIME, UCL (where I am doing my research) and shared with the community.

Advertisements

Tags: , , , , , , , ,

2 Responses to “Similarity algorithms”

  1. Statistics on the similarity algorithms « It's Smee Blog Says:

    […] on the similarity algorithms By itssmee So, just under a month ago I posted the following on similarity […]

  2. mush Says:

    hi thanks for the code, its really useful.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: