When you use an Array, you can only look up data by index 0, 1, 2.
But what if you want to lookup data by string, like this?
To do this, the idea is to store the data in an Array, but also keep track of where each string should be stored. Something like this:
You might think that it's very simple to do this - all we have to do is store the mapping "Alice"→0, "Bob"→1, "Charlie"→2 in memory. It turns out that it's not that simple though, because we actually don't know how to store such a mapping efficiently. Looking up someone's data given a string is what we're trying to do in the first place.
Instead of storing the mapping "Alice"→0, ... in memory, the trick is to use a function that creates the mapping, called a "hash function".
To find the mapping for "Charlie", you pass "Charlie" through a hash function h
to get a huge number. You then mod (%) that number by the size of the array. This gives you a valid index in the array for the word "Charlie".
Every name gets assigned a random-looking index, which is a little annoying, but it works.
One hash function h
might give you these indexes:
And a different hash function might give you these indexes:
So the indexes are pretty random.
Sometimes indexes "collide" and point to the same location. Collisions are impossible to avoid, so you need to allow both items to be stored in the same spot. People do that by just storing everything as a LinkedList node (we haven't talked about Linked Lists yet, but you get the general idea, just store the collisions in the same place):
Collisions are bad because they make things slower. To get Charlie's phone number you first need to walk through Bob's phone number. But collisions are pretty rare. The computer makes sure they're rare by always keeping half the slots in the Array empty. If you add too many items, it just doubles the size of the Array.
A "HashMap" is an Array, but where you look data up by strings and numbers that you choose, instead of just indexes 0,1,2,3. It's named after the "hash" function. Lookups are O(1) on average because collisions are rare. Here's a summary of how you implement a HashMap:
Behind the scenes, HashMaps figure out which slot each key goes in, using this formula: key→ h(key) % len(array)