Posts Tagged ‘Hamming’

Similarity algorithms

June 28, 2010

I have recently been researching a record linkage techniques, and part of this process I have been reminding myself of certain algorithms, and in other case learning these for the first time. As is my way, I typically try and turn the algorithm into code to allow me to understand and learn it. I have coded up examples of the following algorithms in Java:

Levenshtein
Damerau-Levenshtein
Hamming
Jaro Winkler
N-Gram
Markov Chain

Out of curiosity I decided to publish each one separately so I can see from the stats which is the most popular. In a month or so I will add a histogram of the hits for each one (if there are any!).

I hope this is useful.

Be warned, I offer no warranty or guarantee on this code, any changes / enhancements / corrections / or use of this code should be attributed to itssmee, CHIME, UCL (where I am doing my research) and shared with the community.

Advertisements

Java example of Hamming distance

June 28, 2010

This algorithm calculates the distance between two strings, however they have to be of equal length.

It measures the minimum number of substitutions for the two strings to be equal.

It is used in telecommunication (also know as signal distance), it is also used in systematics as a measure of genetic distance.

Wikipedia entry can be found here:


public class Hamming
{
    private String compOne;
    private String compTwo;

    public Hamming(String one, String two)
    {
        compOne = one;
        compTwo = two;
    }

    ///
    //  Calculating the Hamming Distance for two strings requires the string to be of the same length.
    ///
    public int getHammingDistance()
    {
        if (compOne.length() != compTwo.length())
        {
            return -1;
        }

        int counter = 0;

        for (int i = 0; i < compOne.length(); i++)
        {
            if (compOne.charAt(i) != compTwo.charAt(i)) counter++;
        }

        return counter;
    }

    ///
    //  Hamming distance works best with binary comparisons, this function takes a string arrary of binary
    //  values and returns the minimum distance value
    ///
    public int minDistance(String[] numbers)
    {
        int minDistance = Integer.MAX_VALUE;

        if (checkConstraints(numbers))
        {
            for (int i = 1; i < numbers.length; i++)
            {
                int counter = 0;
                for (int j = 1; j <= numbers[i].length(); j++)
                {
                    if (numbers[i-1].charAt(j-1) != numbers[i].charAt(j-1))
                    {
                        counter++;
                    }
                }

                if (counter == 0) return counter;
                if (counter < minDistance) minDistance = counter;
            }
        }
        else
        {
            return -1;
        }

        return minDistance;
    }

    private Boolean checkConstraints(String[] numbers)
    {
        if (numbers.length > 1 && numbers.length <=50)
        {
            int prevLength = -1;
            for (int i = 0; i < numbers.length; i++)
            {
                if (numbers[i].length() > 0 && numbers[i].length() <= 50)
                {
                    if (prevLength == -1)
                    {
                        prevLength = numbers[i].length();
                    }
                    else
                    {
                        if (prevLength != numbers[i].length())
                        {
                            return false;
                        }
                    }

                    for (int j = 0; j < numbers[i].length(); j++)
                    {
                        if (numbers[i].charAt(j) != '0' && numbers[i].charAt(j) != '1')
                        {
                            return false;
                        }
                    }
                }
                else
                {
                    return false;
                }
            }
        }
        else
        {
            return false;
        }

        return true;
    }
}