Topological Sort

Definition

A topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge u->v, u comes before v in the ordering.

From another view, a topological sort is a linear extension of the partial order represented by the directed graph (this naturally requires that the graph is acyclic), where a linear extensition of a partial order is a total order (or linear order) that is compatible with the partial order.

Application: Scheduling under Dependency Constraints

If we use nodes to represent tasks, and use directed edges to represent scheduling constraints or dependencies(u->v implies u must be done before v), then a topological sort is a schedule for performing all tasks under the scheduling constraints.

The Central Theorem (my own words)

A directed graph G is a DAG (Directed Acyclic Graph) <=> it has a topological sort.

Algorithms

The usual algorithms for topological sorting runs in O(V + E) time and space.

Kahn’s Algorithm

Choose vertices in the same order as the eventual topological sort.

First, find a list of “start nodes” which have no incoming edges and insert them into a set S; at least one such node must exist in a non-empty acyclic graph. Then, pick one from S, place it in a output list L, update the in-degrees of its successors, and insert those which have no incoming edges into S. Keep doing this until S is empty.

L ← Empty list that will contain the sorted nodes
S ← Set of all nodes with in-degree 0

while S not empty do
    remove a node n from S
    append n to L
    for each node v with an edge e from u to v do
        remove edge e from the graph
        if v has in-degree 0 then
            insert v into S

if graph has edges then
    return error  (graph has at least one cycle)
else
    return L  (a topologically sorted order)

Tarjan’s Algorithm (DFS)

Choose vertices in the reverse order as the eventual topological sort.

If during DFS, an active node is visited (a node is active since we first visit it, and it only become finished when all of the nodes reachable from it have been visited), then the graph contains a cycle.

L ← Empty list that will contain the sorted nodes

while exists nodes not marked FINISHED do
    select an unmarked node n
    dfs(n)

function dfs(node n)
    if n marked FINISHED then
        return
    if n marked ACTIVE then
        stop  (not a DAG)

    mark n ACTIVE

    for each node v with an edge from u to v do
        dfs(v)

    mark n FINISHED
    add n to head of L