Disjoint Set Union

Home > Computer Science > Algorithms and data structures > Disjoint Set Union

This is a data structure used to maintain sets of elements that are partitioned into disjoint subsets.

Union-Find algorithm: This is the core algorithm used to implement Disjoint Set Union (DSU) data structure. It consists of two operations, union and find. It enables efficient implementation of operations like connected components, cycle detection and minimum spanning trees.
Path compression technique: This is a technique used to optimize the time complexity of find operation in Union-Find algorithm. It involves flattening the path from a node to its root by making all the nodes along the path directly point to the root node.
Union by rank technique: This is another optimization technique used in Union-Find algorithm to improve the time complexity of the union operation. It involves attaching the shorter tree to the root of the taller tree, based on the rank of the trees.
Disjoint Set data structure: This is a data structure used to maintain a collection of disjoint sets. It supports operations like makeSet, union and find, which are used to form new sets, merge existing sets and determine the set membership of an element respectively.
Kruskal’s algorithm: This is a minimum spanning tree algorithm that uses the DSU data structure to maintain the component information during the construction of the minimum spanning tree.
Connected components: This refers to the subsets of the nodes in a graph such that there is a path between any pair of nodes in the subset. The DSU data structure can be used to efficiently determine the connected components in a graph.
Cycle detection: This refers to determining if a given graph contains a cycle. The Union-Find algorithm can be used to perform cycle detection by keeping track of the parent of each vertex and checking if two vertices have the same parent during the traversal.
Dynamic connectivity: This refers to determining if two vertices in a graph are connected after the addition or removal of edges. The DSU data structure can be used to maintain the connectivity information of the graph in real-time.
Applications of Disjoint Set Union: Disjoint Set Union finds applications in a variety of domains like network connectivity, image processing, computer vision, and machine learning. Understanding the applications of DSU can help in gaining a deeper appreciation of its utility and potential use cases.
Naive implementation: In this implementation, each element in a set maintains a pointer to its parent. To find the representative of a set, we keep following the pointer upward until we reach an element whose parent is itself. The time complexity of this implementation is O(n alpha(n)), where alpha(n) is the inverse Ackermann function, which has a very small value for any practical input size.
Union by rank: In this implementation, each set maintains a rank value, which is an upper bound on the height of the set's tree. When we union two sets, we make the shorter tree a subtree of the taller tree. If the heights are equal, we choose one tree as the root and increment its rank. This implementation has a time complexity of O(alpha(n)).
Path compression: In this implementation, when we traverse a path to find the representative of a set, we make all the elements on the path point directly to the representative. This flattens the tree and reduces the height of subsequent traversals to O(1) on average. The time complexity of this implementation is also O(alpha(n)).
Path splitting: In this implementation, when we traverse a path to find the representative of a set, we split the path into two halves and point all elements on one half directly to the representative. This breaks the path up into smaller pieces, reducing the time for subsequent traversals. The time complexity of this implementation is also O(alpha(n)).
Path halving: In this implementation, when we traverse a path to find the representative of a set, we make every other element on the path point directly to the representative. This reduces the height of the tree by approximately half, but may result in longer paths if the tree is unbalanced. The time complexity of this implementation is also O(alpha(n)).
Weighted union: In this implementation, we maintain the size of each set and always merge the smaller set into the larger set. This guarantees that the height of the resulting tree is at most log(n), where n is the number of elements. The time complexity of this implementation is O(alpha(n)).
Ranked path compression: In this implementation, we combine the path compression and union by rank techniques. In addition to compressing paths, we also update the rank of each set during path compression. The time complexity of this implementation is O(alpha(n)).
"Disjoint-set data structure, also called a union–find data structure or merge–find set, is a data structure that stores a collection of disjoint (non-overlapping) sets."
"It stores a partition of a set into disjoint subsets."
"It provides operations for adding new sets, merging sets (replacing them by their union), and finding a representative member of a set."
"The last operation makes it possible to find out efficiently if any two elements are in the same or different sets."
"Disjoint-set data structures are often identified with a particular implementation called a disjoint-set forest."
"This is a specialized type of forest which performs unions and finds in near-constant amortized time."
"To perform a sequence of m addition, union, or find operations on a disjoint-set forest with n nodes requires total time O(mα(n))."
"Disjoint-set forests do not guarantee this performance on a per-operation basis. Individual union and find operations can take longer than a constant times α(n) time."
"Each operation causes the disjoint-set forest to adjust itself so that successive operations are faster."
"Disjoint-set data structures play a key role in Kruskal's algorithm for finding the minimum spanning tree of a graph."
"The importance of minimum spanning trees means that disjoint-set data structures underlie a wide variety of algorithms."
"Disjoint-set data structures also have applications to symbolic computation."
"Disjoint-set data structures have applications in compilers, especially for register allocation problems."
"Disjoint-set data structure, also called a union–find data structure or merge–find set..."
"A disjoint-set data structure stores a collection of disjoint (non-overlapping) sets."
"The last operation makes it possible to find out efficiently if any two elements are in the same or different sets."
"A disjoint-set forest performs unions and finds in near-constant amortized time."
"To perform a sequence of m addition, union, or find operations on a disjoint-set forest with n nodes requires total time O(mα(n)), where α(n) is the extremely slow-growing inverse Ackermann function."
"Individual union and find operations can take longer than a constant times α(n) time..."
"Disjoint-set forests are both asymptotically optimal and practically efficient."