Microsoft: Word Ladder — Backend Engineering Interview Q&A (2026)

Word Ladder

Microsoft 🧠 Hard Programming Hash Tables Strings BFS DSA

Beginner Mode

Problem Statement

A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> ... -> sk such that:

Every adjacent pair of words differs by a single letter.
Every si for 1 <= i <= k is in wordList. Note that beginWord does not need to be in wordList.
sk == endWord

Given two words, beginWord and endWord, and a dictionary wordList, return the number of words in the shortest transformation sequence from beginWord to endWord, or 0 if no such sequence exists.

Additional information

1 <= beginWord.length <= 10
endWord.length == beginWord.length
1 <= wordList.length <= 5000
wordList[i].length == beginWord.length
beginWord, endWord, and wordList[i] consist of lowercase English letters.
beginWord != endWord
All the words in wordList are unique.

Example 1:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]

Output: 5

Explanation: One shortest transformation sequence is "hit" -> "hot" -> "dot" -> "dog" -> "cog", which is 5 words long.

Example 2:

Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]

Output: 0

Explanation: The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.

Brute Force (BFS with Pairwise Comparison)

Intuition

We are looking for the shortest path between a starting word and an ending word. Finding the shortest path in an unweighted graph is the textbook use case for Breadth-First Search (BFS).

In our graph, every word is a node, and an edge exists between two words if they differ by exactly one character. In a brute-force approach, to find the "neighbors" of the word we just popped from our BFS queue, we can iterate through the entire wordList and compare every single character to see if they differ by exactly 1 letter.

Algorithm

If endWord is not in wordList, return 0 immediately.
Create a helper function is_adjacent(word1, word2) that returns True if the words differ by exactly 1 character.
Initialize a deque (queue) with [beginWord] and a visited set with beginWord.
Initialize a res counter to 1 (since the sequence length includes the starting word).
While the queue is not empty:

Get the current level_size of the queue.
Loop level_size times to process the current layer:
Pop the curr_word.
If curr_word == endWord, return res.
Loop through every word in wordList. If it is not in visited and is_adjacent(curr_word, word) is True, add it to the queue and the visited set.
Increment res by 1 after fully processing the level.

If the queue empties without finding the endWord, return 0.

def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
    if end_word not in word_list:
        return 0
        
    def is_adjacent(w1, w2):
        diff = 0
        for i in range(len(w1)):
            if w1[i] != w2[i]:
                diff += 1
            if diff > 1:
                return False
        return diff == 1
        
    q = deque([begin_word])
    visited = set([begin_word])
    res = 1
    
    while q:
        level_size = len(q)
        for _ in range(level_size):
            curr_word = q.popleft()
            
            if curr_word == end_word:
                return res
                
            # Pairwise comparison against the entire word list
            for word in word_list:
                if word not in visited and is_adjacent(curr_word, word):
                    visited.add(word)
                    q.append(word)
                    
        res += 1
        
    return 0

Time & Space Complexity

Time complexity: O(N^2 * M)
Reason: M is the length of the words and N is the total number of words. For every single word we process in the worst case (N), we scan the entire wordList again (N) and perform a string comparison taking O(M) time. This results in a massive quadratic slowdown.
Space complexity: O(N)
Reason: The queue and the visited set can both store up to N words.

BFS with Pattern Matching - Optimal

Intuition

We can completely eliminate the O(N) nested loop by pre-processing the wordList into an adjacency list using patterns.

Any word can be represented by a pattern where one character is replaced by a wildcard (e.g., *). For example, the word "hot" matches the patterns "*ot", "h*t", and "ho*".
If we group the words in our list by these patterns, we can instantly find all neighbors! If our current word is "hit", we generate its patterns ("*it", "h*t", "hi*"). We can immediately look up "h*t" in our dictionary and find that "hot" is a valid neighbor, bypassing the need to check the rest of the dictionary.

Algorithm

If endWord is not in wordList, return 0.
Initialize a defaultdict(list) called adj.
Add beginWord to wordList to ensure its patterns are mapped.
Pre-processing: Loop through every word in wordList. For each word, generate all possible patterns by replacing one character at a time with *. Append the word to adj[pattern].
Initialize a deque with [beginWord] and a visited set with beginWord. Initialize res = 1.
While the queue is not empty:

Iterate through the current level of the queue.
Pop curr_word. If it equals endWord, return res.
Generate all patterns for curr_word.
For each pattern, look up its mapped neighbors in adj. If a neighbor hasn't been visited, add it to visited and the queue.

Increment res after each level. Return 0 if the queue empties.

def ladder_length(begin_word: str, end_word: str, word_list: list[str]) -> int:
    if end_word not in word_list:
        return 0
        
    # Add begin_word to the list so its patterns are processed
    word_list.append(begin_word)
    adj = defaultdict(list)
    
    # Build the pattern dictionary
    for word in word_list:
        for j in range(len(word)):
            pattern = word[:j] + "*" + word[j+1:]
            adj[pattern].append(word)
            
    q = deque([begin_word])
    visited = set([begin_word])
    res = 1
    
    while q:
        for _ in range(len(q)):
            curr_word = q.popleft()
            
            if curr_word == end_word:
                return res
                
            # Find neighbors using the pre-built patterns
            for j in range(len(curr_word)):
                pattern = curr_word[:j] + "*" + curr_word[j+1:]
                for neighbor in adj[pattern]:
                    if neighbor not in visited:
                        visited.add(neighbor)
                        q.append(neighbor)
                        
        res += 1
        
    return 0

Time & Space Complexity

Time complexity: O(M^2 * N)
Reason: M is the length of each word and N is the total number of words. Pre-processing the wordList takes O(M^2 * N) because for each of the N words, we do M string slices/concatenations that take O(M) time. During the BFS, finding neighbors takes O(M^2) time per word, resulting in a perfectly optimized runtime.
Space complexity: O(M^2 * N)
Reason: The adj dictionary stores N words, and each word is placed into M different pattern buckets. Generating the patterns as string keys takes up auxiliary space proportional to M^2 * N.

Quick Solution

Code Environment

Track

	Question	Difficulty	Company	Access

Need more practice in this area? Explore more questions →