What the Hash?

Ian Rones
4 min readMay 8, 2020
Hash Function example

Let’s say, for example, that we needed to use a phonebook. We open it up, brush off the dust and cobwebs from in between the pages and look for your aunt’s name. Despite the unnatural weight and size of this brick, a phonebook is pretty organized! We flip to the first letter of the last name, scan through the hundreds of lines until finally, you find the name and number of the person you were trying to call. Easy right? Okay, let’s move forward in time to our phones. If we look through the contacts page, what do we see? Precisely, the same format used by its predecessors, minus the constant flipping and paper cuts. These examples are a real-world scenario of a simple, yet complex data structure used in computer science called hashing. Need another example? A glance at the Dewey Decimal System might change you’re thinking, or even more abstract might be how a password translates after adding it to a system’s database. So let’s talk about what makes a hash table so useful, and why.
So hold on, why hash tables? Compared to say a linked list or a binary search, why should we use this. Fortunately, if you’re still reading this article and you’re wondering the same thing, that’s great! I’m here to shed some light on the subject. A linked list is excellent at holding datasets, but having to traverse through a linked list isn’t the best timewise, with it being at O(n). A binary search, pretty effective as it is O(log(n)). Fast, but not fast enough for what we had in mind, no we want it to be at O(1) the very fastest. And hash tables can do that because they tell us where to put the data and where to get the data.

Hash tables are incredibly efficient in searching inserting and deleting items because of its fast look-up of its items. Think of it as an array with added features. And the great thing about hash functions is that universally in any programming language; it always has two distinct parts: The array that houses the data and the hash function, which decides where the data gets stored, meaning an easy way to find that data.

So let’s look at the first part of the hash: the table. Inside of the table are buckets that can hold data. Each of the buckets can hold a large amount of data inside them, and each with a unique memory reference for us to find it. The hash function is the second valuable asset to the hash function because, without it, we wouldn’t be able to find the correct bucket with the data that we need from it. They are deterministic, meaning every time it runs, the result is always the same. Hash functions are also irreversible, so we can’t take the result of one hash function and run it through another hash function to get the original value. A basic overview of the hash algorithm is to return the remainder of the item length by the size of the table. And that is it! Those are the basics of a hash table.Those are the basics of a hash table! So what we have so far is that hash table is:

  1. Irreversible: Hash functions are one way, you can’t take the result of one hash function and use in another hash function to return the original value.
  2. Deterministic: No matter how many iterations, the results are always the same.

Now that we’ve covered the basics of a hash table, let’s talk about something that happens within hash tables: collisions. Think of valet parking, somehow the valet drivers find a place to park your car, regardless of whether or not there’s a car in the parking spot, they find a place to park. So in a matter of hours, your car may be parked in the parking spot, but right behind it could be three, four cars. Some valets have multiple levels for them to park to allow only one car in one parking spot. A collision happens when the hash functions return an index, and at that index, some other data is already there. If that happens, we have a few methods of solving them, called separate chaining and open addressing.

Separate chaining is like valet parking cars one after another. If data is already in a bucket, we create a linked list to connect that first value to the second. Extremely easy to input and used more frequently when there’s an unknown amount of data, and an unknown number of times, this hash table is used. However, the hash table sometimes takes a bit longer to traverse in case each bucket has a long linked list. So the time complexity for separate chaining in traversing, deleting, and inserting is 0(1), with the worst case being 0(n).

Open addressing is another solution to collisions. A few differences between separate chaining is the size of the table must be smaller or equal to the total number of keys. Open addressing methods changes slightly, instead of increasing the size of the table with each key, the hash function probes each bucket to check if there’s data within that bucket already. If there is data within that bucket, it continues to probe past that bucket to find the next available free slot. These methods include linear probing, quadratic probing, and double hashing. Each method is slightly different in terms of how they probe through each bucket but similar in inserting, deleting, and searching.

--

--

Ian Rones

USAF veteran, junior software engineer and bartender who loves good food, drinks, and taking in the view and posting some awesome things. Here’s to a great time