Word Ladder
Problem Statement
A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> ... -> sk such that:
- Every adjacent pair of words differs by a single letter.
- Every
si for 1 <= i <= k is in wordList. Note that beginWord does not need to be in wordList.
sk == endWord
Given two words, beginWord and endWord, and a dictionary wordList, return the number of words in the shortest transformation sequence from beginWord to endWord, or 0 if no such sequence exists.
Additional information
1 <= beginWord.length <= 10
endWord.length == beginWord.length
1 <= wordList.length <= 5000
wordList[i].length == beginWord.length
beginWord, endWord, and wordList[i] consist of lowercase English letters.
beginWord != endWord
- All the words in
wordList are unique.
Example 1:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
Output: 5
Explanation: One shortest transformation sequence is "hit" -> "hot" -> "dot" -> "dog" -> "cog", which is 5 words long.
Example 2:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
Output: 0
Explanation: The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.
Brute Force (BFS with Pairwise Comparison)
Intuition
We are looking for the shortest path between a starting word and an ending word. Finding the shortest path in an unweighted graph is the textbook use case for Breadth-First Search (BFS).
In our graph, every word is a node, and an edge exists between two words if they differ by exactly one character. In a brute-force approach, to find the "neighbors" of the word we just popped from our BFS queue, we can iterate through the entire wordList and compare every single character to see if they differ by exactly 1 letter.
Algorithm
- If
endWord is not in wordList, return 0 immediately.
- Create a helper function
is_adjacent(word1, word2) that returns True if the words differ by exactly 1 character.
- Initialize a
deque (queue) with [beginWord] and a visited set with beginWord.
- Initialize a
res counter to 1 (since the sequence length includes the starting word).
- While the queue is not empty:
Get the current level_size of the queue.
Loop level_size times to process the current layer:
Pop the curr_word.
If curr_word == endWord, return res.
Loop through every word in wordList. If it is not in visited and is_adjacent(curr_word, word) is True, add it to the queue and the visited set.
Increment res by 1 after fully processing the level.
- If the queue empties without finding the
endWord, return 0.
def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
if end_word not in word_list:
return 0
def is_adjacent(w1, w2):
diff = 0
for i in range(len(w1)):
if w1[i] != w2[i]:
diff += 1
if diff > 1:
return False
return diff == 1
q = deque([begin_word])
visited = set([begin_word])
res = 1
while q:
level_size = len(q)
for _ in range(level_size):
curr_word = q.popleft()
if curr_word == end_word:
return res
# Pairwise comparison against the entire word list
for word in word_list:
if word not in visited and is_adjacent(curr_word, word):
visited.add(word)
q.append(word)
res += 1
return 0
Time & Space Complexity
Time complexity: O(N^2 * M)
Reason: M is the length of the words and N is the total number of words. For every single word we process in the worst case (N), we scan the entire wordList again (N) and perform a string comparison taking O(M) time. This results in a massive quadratic slowdown.
Space complexity: O(N)
Reason: The queue and the visited set can both store up to N words.
BFS with Pattern Matching - Optimal
Intuition
We can completely eliminate the O(N) nested loop by pre-processing the wordList into an adjacency list using patterns.
Any word can be represented by a pattern where one character is replaced by a wildcard (e.g., *). For example, the word "hot" matches the patterns "*ot", "h*t", and "ho*".
If we group the words in our list by these patterns, we can instantly find all neighbors! If our current word is "hit", we generate its patterns ("*it", "h*t", "hi*"). We can immediately look up "h*t" in our dictionary and find that "hot" is a valid neighbor, bypassing the need to check the rest of the dictionary.
Algorithm
- If
endWord is not in wordList, return 0.
- Initialize a
defaultdict(list) called adj.
- Add
beginWord to wordList to ensure its patterns are mapped.
- Pre-processing: Loop through every
word in wordList. For each word, generate all possible patterns by replacing one character at a time with *. Append the word to adj[pattern].
- Initialize a
deque with [beginWord] and a visited set with beginWord. Initialize res = 1.
- While the queue is not empty:
- Iterate through the current level of the queue.
- Pop
curr_word. If it equals endWord, return res.
- Generate all patterns for
curr_word.
- For each pattern, look up its mapped neighbors in
adj. If a neighbor hasn't been visited, add it to visited and the queue.
- Increment
res after each level. Return 0 if the queue empties.
def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
if end_word not in word_list:
return 0
# Add begin_word to the list so its patterns are processed
word_list.append(begin_word)
adj = defaultdict(list)
# Build the pattern dictionary
for word in word_list:
for j in range(len(word)):
pattern = word[:j] + "*" + word[j+1:]
adj[pattern].append(word)
q = deque([begin_word])
visited = set([begin_word])
res = 1
while q:
for _ in range(len(q)):
curr_word = q.popleft()
if curr_word == end_word:
return res
# Find neighbors using the pre-built patterns
for j in range(len(curr_word)):
pattern = curr_word[:j] + "*" + curr_word[j+1:]
for neighbor in adj[pattern]:
if neighbor not in visited:
visited.add(neighbor)
q.append(neighbor)
res += 1
return 0
Time & Space Complexity
Time complexity: O(M^2 * N)
Reason: M is the length of each word and N is the total number of words. Pre-processing the wordList takes O(M^2 * N) because for each of the N words, we do M string slices/concatenations that take O(M) time. During the BFS, finding neighbors takes O(M^2) time per word, resulting in a perfectly optimized runtime.
Space complexity: O(M^2 * N)
Reason: The adj dictionary stores N words, and each word is placed into M different pattern buckets. Generating the patterns as string keys takes up auxiliary space proportional to M^2 * N.