Sequencer

A sequencer is a system that generates unique, ordered identifiers for events or transactions in a distributed system. These unique IDs ensure consistency and order when multiple services interact or process events in parallel.

In a large distributed system, there can be millions of events happening every second, like liking a video on TikTok, leaving a comment on a blog, or uploading a photo to Snapchat. To keep track of these events, we need to give each one a unique ID.

Another common use of unique IDs is in databases, where they act as primary keys for each entry. Many databases use an auto-increment feature to generate these IDs, but in distributed systems with multiple nodes working independently, that doesn’t work. Instead, we need a way to create unique IDs across all the nodes, especially in systems like a sharded database.

These unique IDs are also helpful for tracking events in logs, making it easier to debug issues. For example, Uber’s Jaeger tracing system uses unique IDs to follow requests as they move through different services, helping engineers understand how events flow through the system.

Requirements for our unique ID system:

Uniqueness: Each event must get its own unique ID so we can easily identify and track it.
Scalability: The system should be able to create at least a billion unique IDs every day to handle a large number of events.
Availability: Since events can happen even within nanoseconds, the system needs to be able to generate IDs for all these events without delay.
Low Latency: The ID generation must be fast with minimal delay.
64-bit numeric ID: We’ll use 64-bit IDs because they provide more than enough space for many years to come.

Total possible IDs: 2⁶⁴

Events per day: 1 billion (or 10⁹)

Events per year: 365 billion (or 365 × 10⁹)

Years before IDs run out: 2⁶⁴ ÷ (365 × 10⁹) ≈ 50.5 million years

Sequencer Architecture

a. Centralized Sequencer:

How It Works: A single centralized node or service is responsible for generating and distributing sequential IDs. Every event or system component needing an ID contacts this service.
Pros: Easy to implement, ensures strict ordering.
Cons: Single point of failure, potential bottleneck under high load.

b. Distributed Sequencer (Preferred for Scalability):

In a distributed system, the sequencer can be designed using multiple independent nodes, each responsible for generating unique sequences.

How It Works:

Each node or shard is assigned a range of IDs. These ranges are disjoint, meaning no overlap.
Nodes independently generate IDs within their range, ensuring both uniqueness and ordering within that range.
A coordinator service assigns new ranges to nodes when they exhaust their current range.
Optionally, nodes can prepend a node identifier to each ID to guarantee global uniqueness (i.e., Node ID + Local Sequence Number).

Pros: Scales well, no single point of failure.
Cons: Ordering is only guaranteed within a node, not across the entire system.

Solution 1: UUID

A UUID (Universally Unique Identifier) is a 128-bit value used to uniquely identify information in computer systems. It ensures uniqueness across systems and time, making it ideal for distributed environments where multiple entities need to generate unique IDs without central coordination.

Types of UUIDs

UUIDv1: Combines the current timestamp with the machine’s unique identifiers (like MAC address). However, it can reveal system info and doesn’t guarantee randomness.
UUIDv4: Generates IDs using random or pseudo-random numbers, ensuring better privacy and uniqueness.
UUIDv3/UUIDv5: Generates IDs based on hashing a namespace and a name, ensuring uniqueness within a specific domain.

Example of a UUID (UUIDv4):

550e8400-e29b-41d4-a716-446655440000

Downsides of UUIDs

Storage and Performance: UUIDs are 128 bits, making them more storage-intensive than 64-bit IDs like those used in some systems (e.g., Snowflake IDs).
Sorting: UUIDs generated randomly (like UUIDv4) don’t have a natural order, making sorting in databases harder.

Complete implementation of the UUID generation API using Spring Boot

Solution 2: Database-based Sequencer

Creating a database-based sequencer using Spring Boot involves using a relational database (e.g., MySQL, PostgreSQL, or H2) to generate sequential IDs. This can be useful when you want to generate unique, ordered IDs for records (e.g., order numbers, transaction IDs) that might need to be consistent across multiple instances of the application.

Design: Use a relational database with an auto-increment column or an atomic counter (e.g., in Redis or DynamoDB) to generate the sequence numbers.
Pros: Simple to implement using existing database features.
Cons: Scalability issues, high contention, single point of failure if using a single instance.

Complete implementation of the database sequencer using Spring Boot

Solution 3: Snowflake ID Generator

Snowflake is a popular solution for generating unique, ordered, 64-bit IDs.

Design: Snowflake is a popular solution for generating unique, ordered, 64-bit IDs.

Timestamp bits (to ensure time-based ordering).
Node ID bits (to ensure uniqueness across distributed nodes).
Sequence number bits (to generate multiple unique IDs within a millisecond).
Example: 41 bits for timestamp, 10 bits for node ID, and 12 bits for sequence number (this allows generating 4096 IDs per millisecond per node).

Pros: Highly scalable, low-latency, distributed, IDs are time-ordered.
Cons: Requires coordination for node IDs, ordering across nodes is not guaranteed.

Complete implementation of the Snowflake ID generator using Spring Boot

Unique ID generation systems used by prominent companies

Instagram's Sharded ID Generator:

Instagram uses a sharded ID generation system, similar to Snowflake but customized for their architecture.
uid = Timestamp + Datacenter ID + Shard ID + Sequence Number

YouTube's YTID:

YouTube uses YTID, a custom ID generation system designed for video identifiers.
It’s a 64-bit unique identifier system that generates globally unique IDs, ensuring no two videos ever share the same ID.

Uber's Unique ID Generation (uID):

Uber uses a custom ID generation system called uID for ride, transaction, and event identifiers.
Similar to Snowflake, Uber’s system generates 64-bit IDs using a combination of:
uid = Timestamp + Datacenter ID + Shard ID + Sequence Number