Topological Sort
Topological sorting is the secret sauce behind build systems, package managers, and course scheduling. It turns a complex web of dependencies into a clean, linear sequence where every prerequisite is satisfied before its dependent task begins.
In technical terms, a Topological Sort of a directed graph is a linear ordering of its vertices such that for every directed edge $u \to v$, vertex $u$ comes before $v$ in the ordering.
Imagine you are trying to compile a large software project. Some modules depend on others. You cannot compile Module B until Module A is finished. Topological Sort provides the exact order in which you must compile these modules so that no module is ever waiting on a prerequisite.
Real-World Analogy
Think about the dependencies involved in getting dressed for a cold winter day. You cannot put on your shoes before your socks, and you cannot put on your coat before your shirt.
- Tasks: Socks, Shoes, Shirt, Coat, Hat.
- Dependencies:
- Socks $\to$ Shoes
- Shirt $\to$ Coat
- Shirt $\to$ Hat
A valid topological sort would be: Shirt, Socks, Shoes, Coat, Hat. Another valid sort could be: Socks, Shirt, Hat, Shoes, Coat.
Both orders respect all dependencies.
Visual Explanation: Kahn’s Algorithm
Kahn’s Algorithm is the most intuitive way to perform a Topological Sort using a Breadth-First Search (BFS) approach. It works by repeatedly “peeling off” nodes that have no remaining dependencies.
1. The Setup
We track the indegree of every node. The indegree is simply the number of incoming edges (prerequisites) a node has.
flowchart TD
0((0)) --> 1((1))
0((0)) --> 2((2))
1((1)) --> 3((3))
2((2)) --> 3((3))
style 0 fill:#f9f,stroke:#333,stroke-width:4px
- Node 0: Indegree 0 (No prerequisites)
- Node 1: Indegree 1 (Depends on 0)
- Node 2: Indegree 1 (Depends on 0)
- Node 3: Indegree 2 (Depends on 1 and 2)
2. The Process
- Identify Sources: Find all nodes with an indegree of 0. Add them to a Queue.
- Process Sources: Remove a node from the Queue and add it to the final sorted list.
- Update Neighbors: For each neighbor of the removed node, decrement its indegree by 1.
- Repeat: If a neighbor’s indegree becomes 0, it means all its prerequisites are met. Add it to the Queue.
3. Step-by-Step Execution
| Step | Queue | Sorted List | Indegrees (0, 1, 2, 3) |
|---|---|---|---|
| Initial | [0] | [] | (0, 1, 1, 2) |
| Process 0 | [1, 2] | [0] | (-, 0, 0, 2) |
| Process 1 | [2] | [0, 1] | (-, -, 0, 1) |
| Process 2 | [3] | [0, 1, 2] | (-, -, -, 0) |
| Process 3 | [] | [0, 1, 2, 3] | (-, -, -, -) |
When to Use This Pattern
You should consider Topological Sort when your problem involves:
- Dependencies: A set of tasks where some must be completed before others.
- Ordering: Finding a valid sequence to process elements.
- DAG Detection: Verifying if a directed graph contains a cycle.
- Levels/Layers: Problems asking for the “minimum time” to finish tasks where independent tasks can be done in parallel.
Complexity Analysis
Let $V$ be the number of vertices and $E$ be the number of edges.
- Time Complexity:O(V + E)
- We visit every vertex once and iterate through every edge exactly once to calculate and update indegrees.
- Space Complexity:O(V + E)
- We store the adjacency listand an indegree arrayO(V + E).O(V)
- We store the adjacency list
Common Mistakes
- Forgetting Cycle Detection: If the graph is not a DAG, Kahn’s algorithm will complete but the sorted list will contain fewer than $V$ nodes. Always check if the final list length equals the number of nodes.
- Incorrect Indegree Calculation: Ensure you are counting incoming edges, not outgoing ones.
- Graph Representation: Using an edge list instead of an adjacency list can lead tocomplexity if you scan the edges for every node.O(V * E)
- Disconnected Graphs: Ensure your algorithm starts by finding all nodes with indegree 0, not just one starting node.
Next Steps
Now that you understand the theory, let’s look at the implementation:
- Code Templates - Standardized blueprints in 6 languages.
- Practice Problems - Apply the pattern to real interview questions.