# Understanding UnionFind

Implementation details of the UnionFind data-structure, and its utility for implementing Minimum Spanning Tree algorithms

My desire to efficiently solve problems of increasing complexity, I believe, is the origin of my interest in graph-based algorithms. Thus, on the day my university shut down due to a large amount of snow, it was no surprise I found myself implementing and visualizing not one, but two, minimum spanning tree algorithms.

## Overview

Robert Prim and Joseph Kruskal both developed Minimum Spanning Tree algorithms. This article implements Kruskal’s algorithm, and makes use of the `UnionFind` data-structure. If you want to see `UnionFind` in action check out my visualization.

## Intuition

In this section, I provide some intuition for `UnionFind`. I discuss a sub-optimal implementation that achieves the same results as the one we’ll develop farther down, but, for now, is easier to understand.

Below, I create a list with five sub-list. For the purpose of this example, let’s say each sub-list is a distinct group. There are two operations I’d like to perform: `find` and `union`.

 `````` 1 2 3 4 5 6 7 8 9 10 11 `````` ``````let groups = [['a'],['b'],['c'],['d'],['e']] function find(p) { // return the group element "p" belongs too } function union(p, k) { // replace the groups that contain "p" and "k" are // currently in with the union of both groups } ``````

For simplicity, I’ll say that the name of a group is the index of that group in the list. So originally, each letter is in a different group because each sub-list is at a different index in the `groups` list.

Pretending that the above functions `find` and `union` are implemented, let’s make a few function calls and examine the results.

 `````` 1 2 3 4 5 6 7 8 9 10 `````` ``````find('c') // returns 2 union('b', 'c') // new groups = [['a'],['b', 'c'],['d'],['e']] find('c') // returns 1 union('d', 'e') // new groups = [['a'],['b', 'c'],['d', 'e']] union('a', 'b') // new groups = [['a', 'b', 'c'],['d', 'e']] find('a') // returns 1 find('e') // returns 2 ``````

We see above that `find` always tells us the name given letter’s group. `find` allows us to determine if two different letters are in the same group or not. We can create larger groups by calling `union`. `union` determines what group two input letter belong too and creates the union of those groups.

## An Inefficient Implementation

Below, I write a purposely inefficient implementation of the `find` and `union` methods, so the fast version, we make later, will seem more impressive :)

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 `````` ``````function find(p, groups) { for (let i in groups) { // for each group... let group = groups[i] for (let j in group) { // ...for each letter... let letter = group[j] // ...if the letter is found in this group // return the group's first letter if (letter == p) return i; } } } function union(p, k, groups) { let g1 = find(p, groups) // name of group one let g2 = find(k, groups) // name of group two // create the union of the two groups let union_group = groups[g1] + groups[g2] // remove old groups from the groups list and // add the new union group groups.splice(g1, 1); groups.splice(g2, 1); groups.push(union_group); } ``````

After writing the code above, we’ve now implemented the `UnionFind` data-structure, yep it’s that easy! But, as mentioned, the above implementation is inefficient. Let’s take a look at the runtime; the table below shown a slightly worse than linear increase as elements are added.

n time
100 0.03 sec
1000 0.3 sec
10000 2.3 sec
100000 46 sec
1000000 696 sec (~11 min)

## Testing the Inefficient Implementation

The test we perform each iteration is the following:

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 `````` ``````function makeGroups(n) { // makes n groups of size one each with a unique number let groups = [] for (let i=0; i

The test reveals that the implementation has a roughly `linear runtime`. This runtime is what we should expect. Every time we call `find`, we have to look at each element in `linear` order until we find the one we want. And, calling `union` is just two calls to `find`, the construction of the union group, and the removal of two elements: all linear time operations!

## How could it be Faster?

If we want to make this faster, we can start by thinking about the two operations we’re doing `find` and `union`. We could get `find` to have a runtime of O(1) or O(logn) (depending on implementation) by using a dictionary, but, if we are using a dictionary, `union` might still be O(n) because each time we merge two groups we’d have to linearly update each element to point to its new group.

Interestingly, we can make `union` run in O(1) on average (I’ll discuss why we say on average in the conclusion below). We can use a tree-like structure to represent what group an element is currently in and this will avoid the need to update each element individually when building the union of two groups.

The code below shows an example of how we use the dictionary data-structure to store our elements.

 ``````1 2 3 4 5 6 7 8 `````` ``````let groups = { "a": "a", "b": "b", "c": "c", "d": "d", "e": "e" } ``````

The `keys` in the dictionary are the elements we want to keep track of and the `values` are the group that a given element is currently in. At the start, every element is in its own group. Even though we haven’t implemented `find` and `union` yet with this new structure, let’s pretend we have and call some functions.

 ``````1 2 3 4 5 6 7 8 `````` ``````find("a", groups) // returns "a" union("a", "c", groups) // new groups {"a": "c", "b": "b", "c": "c", "d": "d", "e": "e"} union("d", "e", groups) // new groups {"a": "c", "b": "b", "c": "c", "d": "e", "e": "e"} find("a", groups) // returns "c" union("a", "e", groups) // new groups {"a": "c", "b": "b", "c": "e", "d": "e", "e": "e"} find("a", groups) // returns "e" find("d", groups) // returns "e" ``````

Above, we see that `find` appears to be returning the value for a given `key`, but, actually, `find` does a bit more than that. `find` actually looks at the `value` for a given `key` and then pretends the `value` is a `key` and makes another call to `find`. This is confusing, but we see an example of this in `line 6` above. When `find` is asked to locate `a` it sees that `a` belongs to group `c`, but `c` belongs to group `e`! We seemingly could follow this chain forever, but we end when we find an element that belongs to a group with a name that matches itself. We find that `e` is in group `e`, and so we return that `a` is in group `e`.

## I’m still Confused Show Me the Code

The code below implements `find`. We observe that `find` recursively climbs the tree until it reaches a root element. One could say that every element belonging to the same group is a child of the same root element!

 ``````1 2 3 4 5 `````` ``````function find(p, groups) { if (p == groups[p]) return p return find(groups[p], groups) } ``````

That’s a super short definition, but it actually works! However, it’s missing one optimization that will pay-off in the long run. Let’s say I ask `find` what group `a` is in and it has to traverse a tree that’s `one million` in height. Then I call `find` with `a` again. The algorithm would have to make the same traversal of `one million` nodes. So, to improve things, every time I find the group that an element belongs to I set that element to point directly to the root element

To avoid repetitive expensive calls, I set each element to point directly to the `root element` every time I complete a call to find

Here’s the optimized `find` function.

 ``````1 2 3 4 5 6 7 `````` ``````function find(p, groups) { let start = p while (p != groups[p]) p = groups[p]; groups[start] = p return p } ``````

Oh, it’s actually not that much longer. Anyways, now even if it does cost a lot to call `find("a")` it will only cost `O(1)` when we call it a second time! So, we can hope that the average cost of a call will be something closer to `O(1)`.

## Efficient Implementation of Union

Now that we have the `find` function we can use it to implement an efficient `union` function. As discussed, all union has to do is find the groups that two elements belong to and make one element point to the other. Here’s what that looks like:

 ``````1 2 3 4 5 6 `````` ``````function union(p, k, groups) { // the group that p belongs too will point // at the group that k belongs too groups[find(p, groups)] = find(k, groups) } ``````

Awesome! It’s one line. Seems too good to be true, but let’s run the same test as before. If you recall, `one million` elements took 11min to run with the linear implementation. The table below shows the times for this more efficient implementation.

n time
100 1 ms
1000 1 ms
10000 9 ms
100000 2 ms
1000000 0 ms

Ya, it actually said 0 ms :) I though it was broken, but it wasn’t. It’s just that fast… for `one million` elements. So, now the question is `how many elements can it handle?` Let’s do an experiment. Here’s my test code:

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 `````` ``````function test2() { let n = 1000000 while (true) { let groups = makeGroups(n) let start = Date.now(); for (let i=0; i<1000; i++) find(randomMember(n), groups); for (let i=0; i<100; i++) union(randomMember(n), randomMember(n), groups); for (let i=0; i<1000; i++) find(randomMember(n), groups); let time = Date.now() - start; console.log("n: " + n + " time: " + time); n = n * 10; } } ``````

I get this:

 ``````1 2 3 4 `````` ``````n: 1000000 time: 1 n: 10000000 time: 1 FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory ``````

So, ya O(1) on average is `fast`! It runs out of memory before it takes longer than 2ms to complete. Here’s a final version of the code we just created all wrapped-up nicely in a class for you.

 `````` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 `````` ``````class UnionFind { constructor(elements) { // elements is a flat list of numbers or strings this.groups = {} for (let item of elements) this.groups[item] = item } find(p) { let start = p while (p != this.groups[p]) p = this.groups[p]; this.groups[start] = p return p } union(p, k) { this.groups[this.find(p)] = this.find(k) } } ``````

## Conclusion

In this article, we found that `UnionFind` is an easy to implement, fast, and intuitive data-structure. A use cases for `UnionFind` is the implementation of Kruskal’s Minimum Spanning Tree Algorithm. I used the term O(1) on average and by this I meant that sometimes a single call might be slow, but after the first repetitive call the second call will be O(1), and so `on average` a series of calls will have a runtime of O(1). I hope you enjoyed this article. If you want to get in contact, my info’s in the footer below.