Union Find (Disjoint Set Union)
Union-Find (also known as Disjoint Set Union or DSU) is a powerful data structure that transforms
Real-World Analogy
Imagine you are organizing a large conference with attendees from different companies.
Without Union-Find: For each new attendee, you would need to scan through all existing lists to check which company they belong to. If someone from Company A arrives and needs to find all other attendees from their company, you would have to iterate through the entire list.
With Union-Find: You maintain a “representative” for each company group. When a new attendee arrives, you only need to check their representative. If two attendees from the same company arrive separately, you simply connect their representatives. Finding if two attendees are from the same company becomes an instant lookup.
The key insight is that we don’t need to track every relationship individually. We just need to know the “representative” (root) of each group.
Visual Explanation
Let’s visualize how Union-Find structures data as a tree and processes union operations.
graph TD
subgraph "Before Union"
b0[0]
b1[1]
b2[2]
b3[3]
b4[4]
b5[5]
end
subgraph "After Union(0,1) and Union(2,3)"
u0[0] --- u1[1]
u2[2] --- u3[3]
u4[4]
u5[5]
end
subgraph "After Find(0) Path Compression"
p0[0] --> p1[1]
p2[2] --- p3[3]
end
style u1 fill:#aaf,stroke:#333
style u3 fill:#aaf,stroke:#333
style p1 fill:#9f9,stroke:#333
Explanation:
- Initially, each element is its own parent (self-root).
Union(0, 1): Element 0 becomes child of 1 (or vice versa).Union(2, 3): Element 2 becomes child of 3.Find(0): Traverses up to find root (1), then compresses path by making 0 directly point to root.
How It Works
Two Core Operations
1. Find(x): Locate the representative (root) of the set containing element x.
- With path compression: Flattens the tree during traversal, making future finds faster.
2. Union(x, y): Merge the sets containing x and y.
- With union by rank/size: Attaches the smaller tree under the larger one to keep trees balanced.
Key Optimizations
Path Compression: During a find operation, make every node on the path point directly to the root. This transforms the tree structure over time.
Union by Rank: Keep track of tree height (rank). Always attach the shorter tree under the taller one. Alternatively, union by size attaches the smaller set under the larger one.
Together, these optimizations give us amortized time complexity of
When to Use This Pattern
Use Union-Find when you see these characteristics:
- Problems involving connected components (islands, provinces, friend circles)
- Need to determine connectivity between elements dynamically
- Cycle detection in undirected graphs
- Minimum spanning tree construction (Kruskal’s algorithm)
- Problems where elements need to be grouped based on relationships
- Equivalence classes or transitive relationships
- Multiple union and find operations on the same dataset
- Grid problems where adjacent cells need to be connected
Complexity Analysis
| Operation | Time Complexity | Space Complexity | Explanation |
|---|---|---|---|
| Initialization | O(N) | O(N) | We initialize parent array of size N and rank/size array. |
| Find | Amortized O(α(N)) | O(1) | Path compression amortizes over multiple operations. α(N) ≤ 4 for all practical N. |
| Union | Amortized O(α(N)) | O(1) | Depends on two find operations plus constant time comparison. |
| Connected Check | Amortized O(α(N)) | O(1) | Compares roots of two elements. |
Derivation:
- Time: Each operation involves at most one or two finds, which run in amortized. The inverse Ackermann function grows so slowly that for any realistic input size, α(N) ≤ 4.O(α(N))
- Space: We store two arrays of size N (parent and rank/size), givingspace complexity.O(N)
Common Mistakes
Forgetting 1-based indexing: Many LeetCode problems use 1-based node numbering. Initialize Union-Find with
n+1elements.Comparing parents instead of roots: Always use
find(x)to get the root before comparing. Comparingparent[x]andparent[y]directly is incorrect.Not using optimizations: Implementing Union-Find without path compression and union by rank results in
worst-case time per operation.O(N)Grid coordinate conversion: For 2D grid problems, remember to convert (row, col) to 1D index using
row * cols + col.Early termination in cycle detection: In problems like “Redundant Connection”, return the edge that causes a cycle immediately, not after processing all edges.
Next Steps
Now that you understand the concept, check out our Code Templates for ready-to-use implementations in multiple languages, or jump straight to Practice Problems to apply your knowledge to real interview questions.