 ### Dmitry Halai

Software engineer

# Union-Find algorithms in Ruby

It’s always interesting to learn something new, especially if it’s fundamental knowledge. Recently I have found very useful course about algorithms. Lectures are meaningful and practical. I definetly recommend them for everyone who wants to understand basic algorithms.
All algorithms are implemented in Java, but I decided to write them in Ruby too. I understand, that it may be not a good idea (Java is more suitable for this), but I will do it for myself. In my view, it doesn’t matter what language you choose in this case. The main thing is understanding algorithms.
So, in this post I want to tell you about Union-Find algorithms.

## Dynamic connectivity

Firstly, we should understand the reason of existence of this type of algorithms. Just have a look at the picture: As you can see, there are 2 points: `p` and `q`. Could you get an answer: is there a path connecting `p` and `q`?
This type of problems is called `connectivity` problem. Wikipedia tells us about this like

In computing and graph theory, a dynamic connectivity structure is a data structure that dynamically maintains information about the connected components of a graph.

So, `dynamic connectivity` is a data structure, which can be used for describing a graph. But, what is it actually? How can we use it for answering the question?
It means, that we can describe the way from `p` to `q` through the graph of elements. Each element is a point, which can be connected with another one. For example, let’s say we have 10 objects and they are separate initially (they are independent components): We can connect it together and check if some of them have already connected. The connection action is called `union`. `Find` is a query, wich checks if two objects are in the same component. So, we can make `union` 4 with 3: There are more connectivity examples: Now, we know what is `dynamic connectivity`, but how can we implement it? Actually, there are some algorithms to do Union-Find operations.

## Quick-Find

Quick Find is a easiest algorithm of UF.

``````class QuickFind
def initialize(size)
@ids = (0..size-1).to_a
end

def find(index)
@ids[index]
end

def connected?(p, q)
find(p) == find(q)
end

def union(p, q)
return if connected?(p, q)

old = @ids[p]
new = @ids[q]

@ids.map! { |val| val = (val == old)? new : val }
end

def count
@ids.uniq.size
end
end``````

`Find` query just checks whether `p` and `q` are in the same component (have the same value). `Union` query changes all entries with `@ids[p]` to `@ids[q]`

## Quick-Union

`Quick-Union` is a more complex algorithm then `Quick-Find`

``````class QuickUnion

def initialize(size)
@ids = (0..size-1).to_a
@count = size
end

def find(index)
@ids[index] == index ? index : find(@ids[index])
end

def connected?(p, q)
find(p) == find(q)
end

def union(p, q)
return if connected?(p, q)

@ids[find(p)] = find(q)
@count -= 1
end
end``````

`Find` query checks if `p` and `q` have the same root. `Union` changes the id of `p`’s root to the id of `q`’s root.

In this case `find` operation is more complicated, then `union`. Finding the root means pass all path through graph from element to its root. A tree of the components can be very deep. In other words we can `union` very tall tree, which takes a lot of `find` operations. That is why we can improve this algorithm by checking the order of the joined objects.

## Weighted Quick-Union

It’s an improved version of the `Quick-Union` algorithm.

``````class WeightedQuickUnion

def initialize(size)
@ids = (0..size-1).to_a
@size = Array.new(size, 1)
@count = size
end

def find(index)
@ids[index] == index ? index : find(@ids[index])
end

def connected?(p, q)
find(p) == find(q)
end

def union(p, q)
return if connected?(p, q)

root_p, root_q = find(p), find(q)
bigger?(root_p, root_q) ? join(root_q, root_p) : join(root_p, root_q)

@count -= 1
end

private
def bigger?(root_1, root_2)
@size[root_1] > @size[root_2]
end

def join(root_1, root_2)
@ids[root_1] = root_2
@size[root_2] += @size[root_1]
end
end``````

As you can see, we have an array `@size`, which includes the sizes of all components. `Union` query checks the size of components and makes join between smaller and bigger. It means we need less `find` operations. So, the order is important.

## Weighted Quick-Union With Path Compression

Nonetheless, we can make it even better! Every time when we pass trough the tree (after computing the root of `p`), we can set the id of each examined node to point to that root. For example, we have the tree: After `find` it will be: Everything what we need is just change few lines of our `find` method.

``````def find(index)
return index if @ids[index] == index

@ids[index] = @ids[@ids[index]]
find(@ids[index])
end``````

## Tests

Let’s test it!
I have found some data sets. One of them includes 11 union operations, another one 900.

``````class Benchmarks
def tiny
end

def medium
end

private
def test(str)
data = str.split("\n")
size, pairs = data.shift, data

Benchmark.bm(20) do |bm|
%w(QuickFind QuickUnion WeightedQuickUnion WeightedQuickUnionPC).each do |uf_class|
bm.report(uf_class) do
uf = Object.const_get(uf_class).new size.to_i
union(uf, pairs)
end
end
end
end

def union(uf, pairs)
pairs.each do |pair|
p, q = pair.split(' ')
uf.union(p.to_i, q.to_i)
end
end
end``````
``````# tiny data set
user       system     total     real
QuickFind
0.000000   0.000000   0.000000 (0.002838)
QuickUnion
0.000000   0.000000   0.000000 (0.000091)
WeightedQuickUnion
0.000000   0.000000   0.000000 (0.000078)
WeightedQuickUnionPC
0.000000   0.000000   0.000000 (0.000056)

# medium data set
user       system     total     real
QuickFind
0.060000   0.010000   0.070000 (0.051383)
QuickUnion
0.010000   0.000000   0.010000 (0.002796)
WeightedQuickUnion
0.000000   0.000000   0.000000 (0.001658)
WeightedQuickUnionPC
0.010000   0.000000   0.010000 (0.001589)``````

So, everything works like we expected and `Weighted Quick-Union With Path Compression` is the quickest algorithm.

Union-Find algorithms performance: ## Conclusion

Union-Find algorithms are widely used. We can see them in different applications:

• Dynamic connectivity
• Games (Go)
• Percolation
• Least common ancestor
• Equivalence of finite state automata
• etc..

Thx for reading and stay tuned! 