Problem Statement
You are given an array points representing integer coordinates of some points on a 2D-plane, where points[i] = [xi, yi].
The cost of connecting two points [xi, yi] and [xj, yj] is the Manhattan distance between them: |xi - xj| + |yi - yj|, where |val| denotes the absolute value of val.
Return the minimum cost to make all points connected. All points are connected if there is exactly one simple path between any two points.
Additional information
1 <= points.length <= 1000
-10^6 <= xi, yi <= 10^6
- All pairs
(xi, yi) are distinct.
Example 1:
Input: points = [[0,0],[2,2],[3,10],[5,2],[7,0]]
Output: 20
Explanation: We can connect the points as follows:
- Connect
[0,0] and [2,2] with cost 4.
- Connect
[2,2] and [5,2] with cost 3.
- Connect
[5,2] and [7,0] with cost 4.
- Connect
[2,2] and [3,10] with cost 9.
Total cost is 4 + 3 + 4 + 9 = 20. This forms a Minimum Spanning Tree (MST).
Example 2:
Input: points = [[3,12],[-2,5],[-4,1]]
Output: 18
Kruskal's Algorithm (Union-Find)
Intuition
To connect all points with the absolute minimum cost, we need to build a Minimum Spanning Tree (MST). Because any point can technically connect to any other point, this is a "complete graph".
One highly intuitive way to build an MST is Kruskal's Algorithm:
- First, we calculate the Manhattan distance between every single pair of points and generate a massive list of all possible edges.
- We sort these edges from the cheapest cost to the most expensive cost.
- We loop through the sorted edges and greedily try to add the cheapest edge to our network.
- To prevent creating loops (cycles), we use the Union-Find (Disjoint Set) algorithm. If the two points of the edge are already connected in our network, we skip the edge. If they aren't, we merge them and add the cost.
- We stop as soon as we have successfully connected
N - 1 edges!
Algorithm
- Store the number of points
n. If n == 1, return 0.
- Initialize an empty list
edges.
- Use a nested loop to calculate the Manhattan distance between every pair of points
i and j. Append the tuple (distance, i, j) to edges.
- Sort the
edges list.
- Initialize
parent and rank arrays for the Union-Find data structure.
- Define the
find(i) and union(i, j) functions. union should return True if it successfully merged two disjoint sets, and False if a cycle was detected.
- Initialize
res to track the total cost and edges_used to track how many connections we've made.
- Iterate through the sorted
edges. For each cost, u, v:
- If
union(u, v) returns True, add cost to res and increment edges_used.
- If
edges_used == n - 1, we have connected all points! Break early.
- Return
res.
def min_cost_connect_points(points: list[list[int]]) -> int:
n = len(points)
if n == 1:
return 0
# Generate all possible edges
edges = []
for i in range(n):
for j in range(i + 1, n):
dist = abs(points[i][0] - points[j][0]) + abs(points[i][1] - points[j][1])
edges.append((dist, i, j))
# Sort edges strictly by cost
edges.sort()
# Union-Find structure
parent = [i for i in range(n)]
rank = [1] * n
def find(i):
p = parent[i]
while p != parent[p]:
parent[p] = parent[parent[p]]
p = parent[p]
return p
def union(i, j):
p1, p2 = find(i), find(j)
if p1 == p2:
return False
if rank[p1] > rank[p2]:
parent[p2] = p1
rank[p1] += rank[p2]
else:
parent[p1] = p2
rank[p2] += rank[p1]
return True
res = 0
edges_used = 0
# Greedily add the cheapest edges
for cost, u, v in edges:
if union(u, v):
res += cost
edges_used += 1
if edges_used == n - 1:
break
return res
Time & Space Complexity
Time complexity: O(N^2 log N)
Reason: Generating the edges takes O(N^2) time. We generate roughly N^2 / 2 edges. Sorting this massive list of edges takes O(E log E) -> O(N^2 log(N^2)) which simplifies to O(N^2 log N). The Union-Find operations take near O(1) time.
Space complexity: O(N^2)
Reason: We must store all N^2 edges in memory at the exact same time before sorting them.
Prim's Algorithm (Min-Heap) - Optimal
Intuition
While Kruskal's algorithm works, generating and sorting every single possible edge up front takes massive amounts of memory and time. We can build the exact same Minimum Spanning Tree faster using Prim's Algorithm.
Prim's algorithm builds the network dynamically, starting from a single point (e.g., point 0). We keep a visited set to track points already in our network. We also maintain a Min-Heap (Priority Queue) to discover the cheapest available edge leading to an unvisited point.
As we pop the cheapest edge from our heap, we add that new point to our network, and then we calculate the distances from this newly added point to all other unvisited points, pushing those new edges into the heap. This completely avoids sorting the entire N^2 edge list up front!
Algorithm
- Store the number of points
n.
- Initialize a
visited set.
- Initialize a
min_heap with the starting node: [(0, 0)] representing (cost, point_index).
- Initialize
res = 0.
- While the length of
visited is less than n:
- Pop the minimum
cost, i from the min_heap.
- If
i is already in visited, skip it (it was a stale, more expensive edge).
- Add
cost to res and add i to visited.
- Loop through all points
j from 0 to n - 1.
- If
j is not in visited, calculate the Manhattan distance between point i and point j.
- Push the new connection
(dist, j) into the min_heap.
- Return
res.
def min_cost_connect_points(points: list[list[int]]) -> int:
n = len(points)
visited = set()
# Heap stores tuples of (cost, target_node_index)
min_heap = [(0, 0)]
res = 0
while len(visited) < n:
cost, i = heapq.heappop(min_heap)
# If the node is already connected, discard the edge
if i in visited:
continue
# Add the cost and mark the node as connected
res += cost
visited.add(i)
# Calculate distances to all unvisited nodes from this newly added node
for j in range(n):
if j not in visited:
dist = abs(points[i][0] - points[j][0]) + abs(points[i][1] - points[j][1])
heapq.heappush(min_heap, (dist, j))
return res
Time & Space Complexity
Time complexity: O(N^2 log N)
Reason: Every time we connect a new node (N times), we calculate distances to all other nodes (O(N) operations), and push them to the heap. Pushing to a heap of size N^2 takes O(log(N^2)) -> O(log N). Thus, N * N * log N gives O(N^2 log N). However, because we don't build and sort everything upfront, this runs significantly faster in practice than Kruskal's.
Space complexity: O(N^2)
Reason: In the worst case, the heap could store multiple redundant edges to the same unvisited nodes, growing up to O(N^2) elements.