Skip to main content

Command Palette

Search for a command to run...

Consistent Hashing

Updated
6 min read
Consistent Hashing

Consistent hashing is one of the most important concepts in distributed systems.

If a system chooses the wrong hashing technique, things may work fine at the beginning, but as the system grows, data can start getting misplaced. That creates a huge problem because users may not be found in the correct database anymore.

This is exactly why companies like Netflix, Amazon, and many large-scale systems use consistent hashing.

But before understanding consistent hashing, we first need to understand normal hashing.

What is Hashing?

Imagine you have multiple databases and millions of users.

Whenever a new user signs up, your system needs to decide:

“In which database should this user be stored?”

For that, we use a hash function.

A good hash function should always return the same output for the same input.

For example:

  • If user ID 5 goes to Database 2 today,

  • then after one month,

  • the same user ID should still point to Database 2.

Otherwise, we won’t be able to find the user data later.

A Bad Hash Function

The first idea that may come to mind is something like this:

function hash(id) {
    return Math.random();
}

This is completely wrong.

Why?

Because every time you call this function, it returns a different value.

So today a user may go to Database 1, and tomorrow the same user may point to Database 3.

That means data lookup becomes impossible.

A Better Approach

Now another idea comes to mind:

function hash(id) {
    return id % 3;
}

Why % 3?

Because we currently have 3 databases.

Let’s say we have these users:

User ID
Don 1
John 2
Jane 3
Alex 4
Tiger 5

Now let’s calculate:

  • Don → 1 % 3 = 1

  • John → 2 % 3 = 2

  • Jane → 3 % 3 = 0

  • Alex → 4 % 3 = 1

  • Tiger → 5 % 3 = 2

So users get distributed across the databases.

And the best part is:

The same ID will always give the same database.

So far, this looks perfect.

But there is a huge problem.

The Real Problem Starts When Servers Increase

Suppose your application becomes successful.

Now 3 databases are not enough anymore.

You decide to add one more database.

Now total databases = 4.

So your hash function becomes:

function hash(id) {
    return id % 4;
}

Let’s check again:

  • Don → 1 % 4 = 1

  • John → 2 % 4 = 2

  • Jane → 3 % 4 = 3

Still okay.

But now look at Alex:

  • Earlier: 4 % 3 = 1

  • Now: 4 % 4 = 0

Alex suddenly points to a completely different database.

Same problem for Tiger:

  • Earlier: 5 % 3 = 2

  • Now: 5 % 4 = 1

So after adding just one database:

  • Most users get remapped

  • Huge amounts of data need to move

  • System performance drops

  • Cache becomes useless

  • Data lookup becomes expensive

This is the exact problem that consistent hashing solves.

What is Consistent Hashing?

Instead of placing databases in a straight line, consistent hashing places them on a circular ring.

You can imagine it like a clock.

For understanding, let’s assume the ring has values from 0 → 12.

Both: databases and user IDs are hashed onto this ring.

Placing Databases on the Ring

Suppose our databases are placed at:

  • Database A → 2

  • Database B → 5

  • Database C → 9

Now when a user comes:

  1. We calculate the hash of the user ID

  2. Move clockwise on the ring

  3. Store the user in the first database we find

Example

Suppose Don’s hash value is 1.

There is no database at 1.

So we move clockwise.

The next database is at 2.

So Don gets stored there.

Now suppose another user hashes to 6.

There is no database at 6.

Moving clockwise, the next database is at 9.

So that user gets stored in Database C.

This is how consistent hashing distributes data.

Why Is This Better?

Now imagine we add one more database at position 12.

What happens?

Only users between:

  • 9 → 12

need to move to the new database.

All other users remain untouched.

This is the biggest advantage of consistent hashing.

Instead of moving almost all data, we move only a small portion.

That makes scaling much faster and cheaper.

What Happens When a Server Is Removed?

The same idea works when a server crashes or gets removed.

Only the users belonging to that server get redistributed to the next server in the ring.

The rest of the system stays stable.

This is why consistent hashing is heavily used in:

  • distributed databases

  • caching systems

  • load balancers

  • distributed storage systems

The Problem With Simple Consistent Hashing

There is still one small issue.

Suppose databases are placed like this:

  • 8

  • 11

  • 12

Now look carefully.

Most of the ring belongs to Database 8.

That means a huge number of users will go there.

So load distribution becomes uneven.

One server becomes overloaded while others stay mostly empty.

Virtual Nodes to the Rescue

To solve this problem, we use virtual nodes.

Instead of placing a database only once on the ring, we place it multiple times.

For example:

  • Database A may appear at:

    • 2

    • 6

    • 11

  • Database B may appear at:

    • 4

    • 8

    • 12

These are called virtual nodes.

They are not separate databases.

They are simply multiple positions pointing to the same real server.

This creates much better balance across the ring.

Now traffic gets distributed more evenly.

Real-World Analogy

Imagine a pizza delivery system.

Without consistent hashing:

  • whenever a new delivery boy joins,

  • almost all delivery areas must be reassigned.

With consistent hashing:

  • only nearby areas are reassigned,

  • while the rest continue normally.

That is exactly how distributed systems scale smoothly.

Conclusion:

Consistent hashing is a brilliant idea because it solves one of the biggest problems in distributed systems:

how to scale servers without moving massive amounts of data.

A normal hashing technique works fine initially, but the moment servers are added or removed, everything gets reshuffled.

Consistent hashing avoids that problem by using a circular hash ring where only a small portion of data changes when the system scales.

And with virtual nodes, it also distributes load evenly across servers.

That is why consistent hashing is widely used in systems like:

Cassandra, DynamoDB, Redis Cluster, Apache Kafka, CDN systems, distributed caches.

If you understand consistent hashing properly, you understand one of the core ideas behind scalable distributed systems.

References :

https://bytebytego.com/courses/system-design-interview/design-consistent-hashing