Dmitry Halai bio photo

Dmitry Halai

Software engineer

Email Twitter Facebook LinkedIn Github

It’s always interesting to learn something new, especially if it’s fundamental knowledge. Recently I have found very useful course about algorithms. Lectures are meaningful and practical. I definetly recommend them for everyone who wants to understand basic algorithms.
All algorithms are implemented in Java, but I decided to write them in Ruby too. I understand, that it may be not a good idea (Java is more suitable for this), but I will do it for myself. In my view, it doesn’t matter what language you choose in this case. The main thing is understanding algorithms.
So, in this post I want to tell you about Union-Find algorithms.

Dynamic connectivity

Firstly, we should understand the reason of existence of this type of algorithms. Just have a look at the picture:

As you can see, there are 2 points: p and q. Could you get an answer: is there a path connecting p and q?
This type of problems is called connectivity problem. Wikipedia tells us about this like

In computing and graph theory, a dynamic connectivity structure is a data structure that dynamically maintains information about the connected components of a graph.

So, dynamic connectivity is a data structure, which can be used for describing a graph. But, what is it actually? How can we use it for answering the question?
It means, that we can describe the way from p to q through the graph of elements. Each element is a point, which can be connected with another one. For example, let’s say we have 10 objects and they are separate initially (they are independent components): We can connect it together and check if some of them have already connected. The connection action is called union. Find is a query, wich checks if two objects are in the same component. So, we can make union 4 with 3:

There are more connectivity examples:

Now, we know what is dynamic connectivity, but how can we implement it? Actually, there are some algorithms to do Union-Find operations.

Quick-Find

Quick Find is a easiest algorithm of UF.

class QuickFind
  def initialize(size)
    @ids = (0..size-1).to_a
  end

  def find(index)
    @ids[index]
  end

  def connected?(p, q)
    find(p) == find(q)
  end

  def union(p, q)
    return if connected?(p, q)

    old = @ids[p]
    new = @ids[q]

    @ids.map! { |val| val = (val == old)? new : val }
  end

  def count
    @ids.uniq.size
  end
end

Find query just checks whether p and q are in the same component (have the same value). Union query changes all entries with @ids[p] to @ids[q]

Quick-Union

Quick-Union is a more complex algorithm then Quick-Find

class QuickUnion
  attr_reader :count

  def initialize(size)
    @ids = (0..size-1).to_a
    @count = size
  end

  def find(index)
    @ids[index] == index ? index : find(@ids[index])
  end

  def connected?(p, q)
    find(p) == find(q)
  end

  def union(p, q)
    return if connected?(p, q)

    @ids[find(p)] = find(q)
    @count -= 1
  end
end

Find query checks if p and q have the same root. Union changes the id of p’s root to the id of q’s root.

In this case find operation is more complicated, then union. Finding the root means pass all path through graph from element to its root. A tree of the components can be very deep. In other words we can union very tall tree, which takes a lot of find operations. That is why we can improve this algorithm by checking the order of the joined objects.

Weighted Quick-Union

It’s an improved version of the Quick-Union algorithm.

class WeightedQuickUnion
  attr_reader :count

  def initialize(size)
    @ids = (0..size-1).to_a
    @size = Array.new(size, 1)
    @count = size
  end

  def find(index)
    @ids[index] == index ? index : find(@ids[index])
  end

  def connected?(p, q)
    find(p) == find(q)
  end

  def union(p, q)
    return if connected?(p, q)

    root_p, root_q = find(p), find(q)
    bigger?(root_p, root_q) ? join(root_q, root_p) : join(root_p, root_q)

    @count -= 1
  end

  private
  def bigger?(root_1, root_2)
    @size[root_1] > @size[root_2]
  end

  def join(root_1, root_2)
    @ids[root_1] = root_2
    @size[root_2] += @size[root_1]
  end
end

As you can see, we have an array @size, which includes the sizes of all components. Union query checks the size of components and makes join between smaller and bigger. It means we need less find operations. So, the order is important.

Weighted Quick-Union With Path Compression

Nonetheless, we can make it even better! Every time when we pass trough the tree (after computing the root of p), we can set the id of each examined node to point to that root. For example, we have the tree:

After find it will be:

Everything what we need is just change few lines of our find method.

def find(index)
  return index if @ids[index] == index

  @ids[index] = @ids[@ids[index]]
  find(@ids[index])
end

Tests

Let’s test it!
I have found some data sets. One of them includes 11 union operations, another one 900.

class Benchmarks
  def tiny
    test File.read('./tiny_UF.txt')
  end

  def medium
    test File.read('./medium_UF.txt')
  end

  private
  def test(str)
    data = str.split("\n")
    size, pairs = data.shift, data

    Benchmark.bm(20) do |bm|
      %w(QuickFind QuickUnion WeightedQuickUnion WeightedQuickUnionPC).each do |uf_class|
        bm.report(uf_class) do
          uf = Object.const_get(uf_class).new size.to_i
          union(uf, pairs)
        end
      end
    end
  end

  def union(uf, pairs)
    pairs.each do |pair|
      p, q = pair.split(' ')
      uf.union(p.to_i, q.to_i)
    end
  end
end
# tiny data set
  user       system     total     real
QuickFind
  0.000000   0.000000   0.000000 (0.002838)
QuickUnion
  0.000000   0.000000   0.000000 (0.000091)
WeightedQuickUnion
  0.000000   0.000000   0.000000 (0.000078)
WeightedQuickUnionPC
  0.000000   0.000000   0.000000 (0.000056)

# medium data set
  user       system     total     real
QuickFind
  0.060000   0.010000   0.070000 (0.051383)
QuickUnion
  0.010000   0.000000   0.010000 (0.002796)
WeightedQuickUnion
  0.000000   0.000000   0.000000 (0.001658)
WeightedQuickUnionPC
  0.010000   0.000000   0.010000 (0.001589)

So, everything works like we expected and Weighted Quick-Union With Path Compression is the quickest algorithm.

Union-Find algorithms performance:

Conclusion

Union-Find algorithms are widely used. We can see them in different applications:

  • Dynamic connectivity
  • Games (Go)
  • Percolation
  • Least common ancestor
  • Equivalence of finite state automata
  • etc..

More information about UF you can read here.
Thanks Coursera for this amazing course.

In the next publication I will tell you about stacks and queues.
Thx for reading and stay tuned! coffee