Hamming Distance using a dictionary

Question

First time poster. I'm currently programming a hamming distance program using a dictionary.

So if we start with the word "cat" and we want to transform it into "dog", the sequence is cat-->cot-->dot-->dog, and "full" to "tree" is full-->fuel-->feel-->teel-->tyee-->tree. All intermediates must exist in a dictionary. I've implemented that part and it works quite well. My question is about the fringe cases. Some words have no transformations, such as dragon and cosmo. But that's fine, I implemented a quick check.

The real problem is there are lots of paths in which transformations are impossible and I can't seem to find a way to check whether they are impossible. For example, heist to flame is heist-->feist-->feast-->feaseX --> flame. Getting to the end point is impossible and an infinite loop occurs.

My possible options are that after around 20 transformations, I stop the calculation and throw an error, which is fine. Or, there might be way to detect a dead path before the program is ran, this is the best option, I'm not optimistic this is possible.

Now you know why Hamming Distance is usually calculated numerically, rather than using a dictionary. — Robert Harvey, Commented Jul 7, 2015 at 17:10
Note this is the word chain problem, not Hamming distance (the Hamming distance is 3, the number of symbols that differ). — outis, Commented Jul 8, 2015 at 21:44
Related: "Shortest path to transform one word into another", "Algorithm to transform one word to another through valid words", "How to compute 'shortest distance' between two words?", "Best approach to solve Word Chain". — outis, Commented Jul 11, 2015 at 1:15

Karl Bielefeldt · Accepted Answer · 2015-07-08 13:02:48Z

You can solve this with a standard breadth-first search, with the addition of also storing the path taken to a node whenever you enqueue it in the to search queue. BFS algorithms avoid loops by keeping a visited or discovered set. Because they use a queue instead of a stack, they are not normally implemented using recursion, unless the implementation uses immutable data structures.

If you reach a point in the algorithm where you have nothing remaining in the to search queue, you know there is no possible path. Note at this point, the visited set contains all the nodes reachable from the given starting point. You can take advantage of that to precalculate and store all the sets of connected words, by just skipping the part where you check if you reached the goal word. However, it's simpler and may often work out faster to just run the BFS.

Although the way I did it was if all paths coverged on the same node, then the algorithm would end. — Siqi Liu, Commented Jul 8, 2015 at 15:23

PeterAllenWebb · Accepted Answer · 2015-07-08 13:35:36Z

It seems that what you're really trying to do is figure out if two nodes in a graph are in the same connected component. Specifically the graph is an undirected one in which the words of your dictionary are nodes, and two nodes are connected to each other by a vertex when they differ by exactly one letter, and are the same length. The root of the problem is that this graph is not be totally connected. It may be a disconnected graph of mutually unreachable connected components. In fact, it seems from your description that any words with a different number of letters would have to be in separate components.

The italicized keywords above are too technical to explain completely in an answer here, but you will find them on Wikipeida, or in any computer science text that covers graph algorithms. The algorithms you should consider for solving your problem are reachability algorithms for disconnected, undirected graphs.

Robert Harvey · Accepted Answer · 2015-07-07 17:09:13Z

1

To solve the question you pose, you must understand complete search, and for that you have to understand recursive algorithms. This simply means that you must reduce e.g. the problem of reaching "tree" from "full" to the problem of reaching "free" from "full", and handle the intermediate data correctly.

edited Jul 7, 2015 at 17:09

Robert Harvey

200k55 gold badges467 silver badges678 bronze badges

answered Jul 7, 2015 at 17:08

Kilian Foth

110k45 gold badges300 silver badges317 bronze badges

1

Could you go into a bit more detail? I'm interested
– Siqi Liu
Commented Jul 7, 2015 at 18:08

Add a comment |

Alexus · Accepted Answer · 2015-07-07 18:47:48Z

I assume you are using recursion for this. Every transformation is a unit of work to be done, hopefully it is handled in one method, or group of methods.

When you jump to next word, keep track of all the words that you have already been at during this specific path and check the available options against the list of those words to exclude the ones that you already visited.

Example:

heist-(1)->feist-(2)->feast-(3)->feaseX -(4)-> flame

* Numbers signify the execution cycle of the same method. Simplified version for demonstration purposes.

public string[] previouslyUsedWords = new String[]();

public string transformNext(string currentWord) {
    var possibleOptions = _optionsProvider.GetOptions(currentWord);

    foreach(var word in possibleOptions) {
        if(!previouslyUsedWords.Contains(word)) {
            previouslyUsedWords.Add(word);
            return word;
        }
    }

    throw new Exception("No options left, this is a dead end!");
}

Combine this with a tree-traversal algorithm(some paths are dead ends and that's why you need to go back and try other paths) and you should accomplish this task in no time.

Actually if you try to do depth search, you don't get the shortest distance. Going breadth search might work, but how would I implement a tree-traversal algorithm? Binary trees don't work. — Siqi Liu, Commented Jul 7, 2015 at 23:14
Execution path will rarely be shortest distance, however the output you provide should be. That's why I am saying to keep track of nodes you have visited to avoid loops. Also, keep track of nodes that successfully let you navigate closer to your target word. If you find dead end, go back and remove the success nodes from the second list up to the moment where you still have options to explore further. If no options are left, it's a dead end. Binary tree obviously doesn't work, as you have non-binary tree here. — Alexus, Commented Jul 7, 2015 at 23:21

NovaDenizen · Accepted Answer · 2015-07-08 02:20:11Z

I think you should probably use a union-find algorithm to find all connected groups of n-letter words that can transform into each other. Preprocess you dictionary with it, and before you start searching see if the two end words are in the same connected group. The preprocessing will only take time proportional to the number of valid word-to-word transitions, so you could get away with doing it every time you load your dictionary. You would need to store the group id of every word, so that would take a little more space.

Stack Exchange Network

Hamming Distance using a dictionary

5 Answers 5

Not the answer you're looking for? Browse other questions tagged
algorithms
math
or ask your own question.

Linked

Hot Network Questions

Hamming Distance using a dictionary

5 Answers 5

Not the answer you're looking for? Browse other questions tagged algorithmsmath or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
algorithms
math
or ask your own question.