Algorithms

Union-Find algorithms in Ruby

Dmitry Halai

Feb 6, 2016 • 5 min read

It’s always interesting to learn something new, especially if it’s fundamental knowledge.

Recently I have found a very interesting course about algorithms. Lectures are meaningful and practical. I recommend them to everyone who wants to understand basic algorithms.
All algorithms are implemented in Java, but I decided to write them in Ruby too. In my view, it doesn’t matter what language you choose in this case. The main goal is to better understand these algorithms.
So, in this post, I want to tell you about Union-Find (UF) algorithms.

Dynamic connectivity

Firstly, we should understand the reason of this type of algorithm. Just have a look at the picture:

As you can see, there are 2 points: p and q. Question: is there a path between p and q?
Not so easy to answer, right? 🙂

This type of problem is called the connectivity problem. Wikipedia says:

In computing and graph theory, a dynamic connectivity structure is a data structure that dynamically maintains information about the connected components of a graph.

So, dynamic connectivity is a data structure, which can be used for describing a graph. But, what does it mean? How can we use it for answering the question?
It means, we can describe the way from p to q through the graph of elements. Each element is a point, which can be connected with another one. For example, let’s say we have 10 objects and they are separated initially (they are independent components):

We can connect and check if some of them are already connected. The connection action is called union. TheFind is an operation that checks if two objects are in the same component. So, we can union 4 with 3:

Here are more connectivity examples:

Now, we know what is the dynamic connectivity, but how can we implement it? There are some algorithms to do Union-Find operations.

Quick-Find

The Quick Find is the easiest UF algorithm.

class QuickFind
  def initialize(size)
    @ids = (0..size-1).to_a
  end

  def find(index)
    @ids[index]
  end

  def connected?(p, q)
    find(p) == find(q)
  end

  def union(p, q)
    return if connected?(p, q)

    old = @ids[p]
    new = @ids[q]

    @ids.map! { |val| val = (val == old)? new : val }
  end

  def count
    @ids.uniq.size
  end
end

The find operation just checks whether p and q are in the same component (have the same value). The union operation changes all entries with @ids[p] to @ids[q]

Quick-Union

The Quick-Union is a more complex algorithm.

class QuickUnion
  attr_reader :count

  def initialize(size)
    @ids = (0..size-1).to_a
    @count = size
  end

  def find(index)
    @ids[index] == index ? index : find(@ids[index])
  end

  def connected?(p, q)
    find(p) == find(q)
  end

  def union(p, q)
    return if connected?(p, q)

    @ids[find(p)] = find(q)
    @count -= 1
  end
end

The find operation checks if p and q have the same root. The union operation changes the id of p’s root to the id of q’s root.

In that case, the find operation is more complicated, then the union one. Finding the root means passing all paths through the graph from a particular element to its root. A tree of components can be very deep in that case. In other words, we can union very tall trees, which require a lot of find operations. That is why this algorithm can be improved by checking the order of the joined objects.

Weighted Quick-Union

It’s an improved version of the Quick-Union algorithm.

class WeightedQuickUnion
  attr_reader :count

  def initialize(size)
    @ids = (0..size-1).to_a
    @size = Array.new(size, 1)
    @count = size
  end

  def find(index)
    @ids[index] == index ? index : find(@ids[index])
  end

  def connected?(p, q)
    find(p) == find(q)
  end

  def union(p, q)
    return if connected?(p, q)

    root_p, root_q = find(p), find(q)
    bigger?(root_p, root_q) ? join(root_q, root_p) : join(root_p, root_q)

    @count -= 1
  end

  private

  def bigger?(root_1, root_2)
    @size[root_1] > @size[root_2]
  end

  def join(root_1, root_2)
    @ids[root_1] = root_2
    @size[root_2] += @size[root_1]
  end
end

As you can see, we have an array (@size), which includes the sizes of all components. The union method checks the size of components and joins smaller ones with bigger ones. It means fewer find operations are required. Thus, the order is important.

However, we can make it even better!

Weighted Quick-Union With Path Compression

Every time when we go through the tree (after computing the root of p), we can make the id of each examined node point to that root.
For example, we have the tree:

After running the find operation, the tree will be changed to:

All we need is just change a few lines of our find method.

def find(index)
  return index if @ids[index] == index

  @ids[index] = @ids[@ids[index]]
  find(@ids[index])
end

Well, it is time to test it!

Tests

I found some data sets. One of them includes 11 union operations, and another one 900.

class Benchmarks
  def tiny
    test File.read('./tiny_UF.txt')
  end

  def medium
    test File.read('./medium_UF.txt')
  end

  private
  def test(str)
    data = str.split("\n")
    size, pairs = data.shift, data

    Benchmark.bm(20) do |bm|
      %w(QuickFind QuickUnion WeightedQuickUnion WeightedQuickUnionPC).each do |uf_class|
        bm.report(uf_class) do
          uf = Object.const_get(uf_class).new size.to_i
          union(uf, pairs)
        end
      end
    end
  end

  def union(uf, pairs)
    pairs.each do |pair|
      p, q = pair.split(' ')
      uf.union(p.to_i, q.to_i)
    end
  end
end

# tiny data set
  user       system     total     real
QuickFind
  0.000000   0.000000   0.000000 (0.002838)
QuickUnion
  0.000000   0.000000   0.000000 (0.000091)
WeightedQuickUnion
  0.000000   0.000000   0.000000 (0.000078)
WeightedQuickUnionPC
  0.000000   0.000000   0.000000 (0.000056)

# medium data set
  user       system     total     real
QuickFind
  0.060000   0.010000   0.070000 (0.051383)
QuickUnion
  0.010000   0.000000   0.010000 (0.002796)
WeightedQuickUnion
  0.000000   0.000000   0.000000 (0.001658)
WeightedQuickUnionPC
  0.010000   0.000000   0.010000 (0.001589)

As you can see, everything works as we expected and Weighted Quick-Union With Path Compression is the quickest algorithm.

Here's the Union-Find algorithms performance comparison:

Conclusion

Union-Find algorithms are widely used. We can see them in different applications:

Dynamic connectivity
Games (Go)
Percolation
Least common ancestor
Equivalence of finite state automata
etc..

Thanks, Coursera for the amazing course and thank you for reading it 😛.

PS. More information about UF algorithms can be found in the course-related book's chapter.