Problem Statement
There is a new alien language that uses the English alphabet. However, the order among the letters is unknown to you.
You are given a list of strings words from the alien language's dictionary, where the strings in words are sorted lexicographically by the rules of this new language.
Return a string of the unique letters in the new alien language sorted in lexicographically increasing order by the new language's rules. If there is no valid ordering of letters, return "". If there are multiple valid orderings, return any of them.
Additional information
1 <= words.length <= 100
1 <= words[i].length <= 100
words[i] consists of only lowercase English letters.
Example 1:
Input: words = ["wrt","wrf","er","ett","rftt"]
Output: "wertf"
Explanation: - From "wrt" and "wrf", we can deduce 't' comes before 'f'.
- From "wrt" and "er", we can deduce 'w' comes before 'e'.
- From "er" and "ett", we can deduce 'r' comes before 't'.
- From "ett" and "rftt", we can deduce 'e' comes before 'r'.
Combining these rules, the order is "wertf".
Example 2:
Input: words = ["z","x"]
Output: "zx"
Example 3:
Input: words = ["z","x","z"]
Output: ""
Explanation: The order is invalid, so return "".
Recursive DFS (Topological Sort)
Intuition
Since the words are sorted lexicographically, we can determine the relative order of characters by comparing adjacent words. When we find the first differing character between word1 and word2, we know the character from word1 comes strictly before the character from word2.
This relationship forms a directed graph. The problem then boils down to finding a valid Topological Sort of this graph. If a cycle exists (e.g., 'a' comes before 'b', but 'b' also comes before 'a'), the language is invalid.
We can use a Depth-First Search (DFS) with a 3-state tracking mechanism:
unvisited: The character hasn't been explored.
visiting: The character is in our current DFS path. If we see it again, we found a cycle!
visited: The character and all its descendants have been fully processed and added to the result.
Algorithm
- Initialize an adjacency list
adj mapping every unique character in words to an empty set.
- Iterate through adjacent pairs of words
w1 and w2.
- Edge Case: If
len(w1) > len(w2) and the prefix of w1 matches w2, the dictionary is invalid (a longer word cannot come before a shorter prefix of itself). Return "".
- Find the first differing character between
w1 and w2.
- Add the character from
w2 to the adjacency set of the character from w1. Break the loop, as subsequent characters do not provide valid ordering rules.
- Initialize a
visit dictionary to track node states (where True means visiting and False means visited), and a res list.
- Define
dfs(char):
- If
char is in visit, return visit[char] (returning True means a cycle was detected).
- Set
visit[char] = True.
- Recursively call
dfs on all neighbors of char. If any call returns True, bubble up the cycle detection.
- Set
visit[char] = False (mark as fully processed), and append char to res.
- Return
False.
- Iterate through every character in
adj. If dfs(char) returns True, return "".
- Reverse the
res list (since post-order DFS builds the topological sort backwards) and return it as a string.
def alien_order(words: list[str]) -> str:
# Initialize adjacency list for all unique characters
adj = {c: set() for w in words for c in w}
# Build the graph
for i in range(len(words) - 1):
w1, w2 = words[i], words[i + 1]
min_len = min(len(w1), len(w2))
# Invalid dictionary: prefix comes after the full word
if len(w1) > len(w2) and w1[:min_len] == w2[:min_len]:
return ""
for j in range(min_len):
if w1[j] != w2[j]:
adj[w1[j]].add(w2[j])
break
visit = {} # True = current path, False = visited completely
res = []
def dfs(char):
if char in visit:
return visit[char]
visit[char] = True
for neighbor in adj[char]:
if dfs(neighbor):
return True
visit[char] = False
res.append(char)
return False
for char in adj:
if dfs(char):
return ""
res.reverse()
return "".join(res)
Time & Space Complexity
Time complexity: O(C)
Reason: C is the total number of characters across all words in the input array. We compare adjacent words character by character, which takes O(C) time. The graph contains at most 26 vertices (the English alphabet) and at most 26^2 edges. The DFS traverses these in O(V + E) time, which simplifies to O(1) relative to C. Overall time is strictly dominated by building the graph, O(C).
Space complexity: O(1) or O(U + E)
Reason: The graph, recursion stack, and result list use space proportional to the number of unique characters (U) and edges (E). Since the alphabet is fixed to 26 lowercase English letters, the space is bounded by a constant O(1).
Iterative BFS (Kahn's Algorithm) - Optimal
Intuition
We can also build our Topological Sort sequentially from start to finish using Kahn's Algorithm (Breadth-First Search).
After building the graph, we count the "in-degree" of every character-that is, how many characters are strictly required to come before it. If a character has an in-degree of 0, it has no prerequisites, and we can safely put it at the start of our alphabet.
We push all 0-in-degree characters into a queue. As we pop them, we append them to our result string and "remove" their outgoing edges, decreasing the in-degree of their neighbors by 1. If a neighbor's in-degree drops to 0, it is now unlocked and gets added to the queue!
Algorithm
- Initialize an adjacency list
adj and an in_degree mapping for every unique character, starting at 0.
- Iterate through adjacent pairs of words to build
adj and increment in_degree for target characters.
- Handle the invalid prefix edge case (
w1 is longer than w2 but shares the same prefix).
- Initialize a
deque (queue) with all characters that have an in_degree of exactly 0.
- Initialize a
res list.
- While the queue is not empty:
- Pop the front character
curr.
- Append
curr to res.
- Iterate through the neighbors of
curr. Decrement their in_degree by 1.
- If a neighbor's
in_degree becomes 0, append it to the queue.
- If the length of
res equals the number of unique characters, we successfully sorted the alphabet. Return "".join(res).
- Otherwise, a cycle blocked us from visiting all characters. Return
"".
def alien_order(words: list[str]) -> str:
# Initialize adjacency list and in-degrees
adj = {c: set() for w in words for c in w}
in_degree = {c: 0 for w in words for c in w}
# Build the graph
for i in range(len(words) - 1):
w1, w2 = words[i], words[i + 1]
min_len = min(len(w1), len(w2))
if len(w1) > len(w2) and w1[:min_len] == w2[:min_len]:
return ""
for j in range(min_len):
if w1[j] != w2[j]:
if w2[j] not in adj[w1[j]]:
adj[w1[j]].add(w2[j])
in_degree[w2[j]] += 1
break
# Queue up all characters with no prerequisites
q = deque([c for c in in_degree if in_degree[c] == 0])
res = []
# Process the queue
while q:
curr = q.popleft()
res.append(curr)
for neighbor in adj[curr]:
in_degree[neighbor] -= 1
if in_degree[neighbor] == 0:
q.append(neighbor)
# Check if a cycle prevented us from sorting all characters
if len(res) != len(in_degree):
return ""
return "".join(res)
Time & Space Complexity
Time complexity: O(C)
Reason: C is the total length of all words combined. Building the adjacency list and in-degree map requires visiting characters pairwise, bounded by O(C). Processing the queue runs in constant time O(V + E) since V <= 26 and E <= 26^2.
Space complexity: O(1)
Reason: The adj map, in_degree dictionary, and the deque allocate memory exclusively for unique English letters (up to 26). Thus, space complexity is strictly bounded by a constant, O(1).