Ruby Sets: Array’s and Hash’s love child

I’ve recently been diving into algorithm problems in preparation for my technical interviews. A lot of these questions aren’t just about finding a solution, but finding the optimal solution and figuring out its Big O complexity.

For those unfamiliar with Big O notation, it’s just a way of thinking about how well your code performs for really large inputs. Lets take a look at a concrete example.

Say we are given an array of numbers and we wanted to know if there is any combination of 2 numbers which can add up to 11. Here’s one implementation:

Screen Shot 2016-02-16 at 6.47.56 AM

For each number in the first loop, we compute the complement value, and loop through the array to check if it exists. (note: this code is a little sloppy because it does double count values, but for simplicity’s sake, we’re not going to worry about that. also note that for the inner loop we can use “numbers.include?” but under the hood, ruby would be doing the same thing.)

This loop inside a loop means if there are 6 numbers in the array, we are performing 6*6 operations. So as the input array becomes really large, we’ll essentially be doing n*n operations. This means our algorithm has a Big O(n2).

We can actually drastically improve the performance of our algorithm and achieve O(n) runtime by using a data structure called a ‘set’!

A set has a lot of similar functionalities as an array and it has the really fast look up times of a hash.

A set is very similar to an array; it’s a collection of values. However, there are 2 major differences:

  1. A set is unordered. There’s no index to a set. It’s just a pool of values. So while you can do ‘array[3]’ to return the value of the array at index 3, you can’t do this with a set.
  2. You can’t store duplicate values in a set. It’s all unique values.

Other than that, you can manipulate a set in a lot of the similar ways as an array because you have access to the enumerables methods.

However, since a set stores its values as hashes, we get constant time lookups! So no matter how big the input array is, the time to check if a value is in a set will be the same! Using sets is very simple.  We can refactor our code to this:

Screen Shot 2016-02-16 at 7.20.50 AM

There’s a lot of other interesting things you can do with sets. You might not use it much in the wild but it’s a fundamental data structure and something you should absolutely be acquainted with before you head out to the job interview! You can read more about it here.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s