Union Find (Disjoint Set Union)

Union Find (Disjoint Set Union)

Union-Find (also known as Disjoint Set Union or DSU) is a powerful data structure that transforms

O(N)
connectivity checks into nearly
O(1)
operations. It efficiently manages partitions of a set into disjoint (non-overlapping) subsets, making it essential for problems involving connected components, cycle detection, and minimum spanning trees.

Real-World Analogy

Imagine you are organizing a large conference with attendees from different companies.

  • Without Union-Find: For each new attendee, you would need to scan through all existing lists to check which company they belong to. If someone from Company A arrives and needs to find all other attendees from their company, you would have to iterate through the entire list.

  • With Union-Find: You maintain a “representative” for each company group. When a new attendee arrives, you only need to check their representative. If two attendees from the same company arrive separately, you simply connect their representatives. Finding if two attendees are from the same company becomes an instant lookup.

The key insight is that we don’t need to track every relationship individually. We just need to know the “representative” (root) of each group.

Visual Explanation

Let’s visualize how Union-Find structures data as a tree and processes union operations.

  graph TD
    subgraph "Before Union"
        b0[0]
        b1[1]
        b2[2]
        b3[3]
        b4[4]
        b5[5]
    end

    subgraph "After Union(0,1) and Union(2,3)"
        u0[0] --- u1[1]
        u2[2] --- u3[3]
        u4[4]
        u5[5]
    end

    subgraph "After Find(0) Path Compression"
        p0[0] --> p1[1]
        p2[2] --- p3[3]
    end

    style u1 fill:#aaf,stroke:#333
    style u3 fill:#aaf,stroke:#333
    style p1 fill:#9f9,stroke:#333

Explanation:

  • Initially, each element is its own parent (self-root).
  • Union(0, 1): Element 0 becomes child of 1 (or vice versa).
  • Union(2, 3): Element 2 becomes child of 3.
  • Find(0): Traverses up to find root (1), then compresses path by making 0 directly point to root.

How It Works

Two Core Operations

1. Find(x): Locate the representative (root) of the set containing element x.

  • With path compression: Flattens the tree during traversal, making future finds faster.

2. Union(x, y): Merge the sets containing x and y.

  • With union by rank/size: Attaches the smaller tree under the larger one to keep trees balanced.

Key Optimizations

Path Compression: During a find operation, make every node on the path point directly to the root. This transforms the tree structure over time.

Union by Rank: Keep track of tree height (rank). Always attach the shorter tree under the taller one. Alternatively, union by size attaches the smaller set under the larger one.

Together, these optimizations give us amortized time complexity of

O(α(N))
, where α is the inverse Ackermann function. For all practical purposes, this is effectively
O(1)
.

When to Use This Pattern

Use Union-Find when you see these characteristics:

  • Problems involving connected components (islands, provinces, friend circles)
  • Need to determine connectivity between elements dynamically
  • Cycle detection in undirected graphs
  • Minimum spanning tree construction (Kruskal’s algorithm)
  • Problems where elements need to be grouped based on relationships
  • Equivalence classes or transitive relationships
  • Multiple union and find operations on the same dataset
  • Grid problems where adjacent cells need to be connected

Complexity Analysis

OperationTime ComplexitySpace ComplexityExplanation
Initialization
O(N)
O(N)
We initialize parent array of size N and rank/size array.
FindAmortized
O(α(N))
O(1)
Path compression amortizes over multiple operations. α(N) ≤ 4 for all practical N.
UnionAmortized
O(α(N))
O(1)
Depends on two find operations plus constant time comparison.
Connected CheckAmortized
O(α(N))
O(1)
Compares roots of two elements.

Derivation:

  • Time: Each operation involves at most one or two finds, which run in amortized
    O(α(N))
    . The inverse Ackermann function grows so slowly that for any realistic input size, α(N) ≤ 4.
  • Space: We store two arrays of size N (parent and rank/size), giving
    O(N)
    space complexity.

Common Mistakes

  1. Forgetting 1-based indexing: Many LeetCode problems use 1-based node numbering. Initialize Union-Find with n+1 elements.

  2. Comparing parents instead of roots: Always use find(x) to get the root before comparing. Comparing parent[x] and parent[y] directly is incorrect.

  3. Not using optimizations: Implementing Union-Find without path compression and union by rank results in

    O(N)
    worst-case time per operation.

  4. Grid coordinate conversion: For 2D grid problems, remember to convert (row, col) to 1D index using row * cols + col.

  5. Early termination in cycle detection: In problems like “Redundant Connection”, return the edge that causes a cycle immediately, not after processing all edges.

Next Steps

Now that you understand the concept, check out our Code Templates for ready-to-use implementations in multiple languages, or jump straight to Practice Problems to apply your knowledge to real interview questions.

Pro Tip: Union-Find with union by rank and path compression gives nearly optimal performance. Focus on problem modeling - decide whether to represent grid cells, graph nodes, or complex objects as array indices. The data structure itself is consistent across problems.