Chat Application : WhatsApp
Functional Requirements
- Should support one-on-one chats
- Should support group chats
- Should have image, video and file-sharing capabilities
- Should indicate read/receipt of messages
- Show indicate last seen time of users
Non-Functional Requirements:
- Should have very very low latency
- Should be always available
- There shouldn’t be any lags
- Should be highly scalable
Estimation:
As of early 2025, WhatsApp reportedly handles 100 billion messages per day.This figure includes text messages, media, and voice/video calls.
Storage Estimation:
- 100 billion messages are shared through WhatsApp per day and each message takes 100 bytes on average
- 100 billion/day∗100 Bytes = 10 TB/day
- For 30 days, the storage capacity would become the following:
- 30∗10 TB/day = 300 TB/month
Number of server assumption:
- Lets assume whatsApp handles around 10 million connections on a single server, which seems quite high for a server.
- No. of servers = Total connections per day/No. of connections per server = 2 billion/10 million = 200 servers
- So, according to the above estimates, we require 200 chat servers.
High level design:
- Connection with a WebSocket server:
- Each active WhatsApp device connects to a WebSocket server using the WebSocket protocol.
- WebSocket servers maintain open connections for all active (online) users.
- Multiple servers are deployed to handle billions of users since a single server cannot handle the entire load.
- Each server is responsible for providing a port to every online user.
- A WebSocket manager oversees the mapping of servers, ports, and users.
- The mapping data is stored and managed using a cluster of the data store, with Redis being used for this purpose
.
- Send or receive messages:
The system performs the following steps to send messages from user A to user B:
- User A sends a message via their connected WebSocket server.
- User A's WebSocket server queries the WebSocket manager to find the server connected to User B.
- If User B is online, the WebSocket manager provides the details of User B's WebSocket server to User A's WebSocket server.
- User A's WebSocket server sends the message to the message service, which stores it in a MySQL database for processing in first-in-first-out (FIFO) order.
- Messages are deleted from the MySQL database once delivered to the recipient.
- After identifying User B's WebSocket server, communication between User A and User B begins directly via their respective WebSocket servers.
- If User B is offline, messages are stored in the MySQL database.
- When User B comes online, messages are delivered via push notification or deleted permanently after 30 days.
- Both users communicate with the WebSocket manager to locate each other’s servers.
- Frequent conversations lead to caching optimizations by each WebSocket server, minimizing calls to the WebSocket manager.
- Each WebSocket server caches recent conversation details, including user-to-server mappings.
- If users are connected to the same server, calls to the WebSocket manager are avoided.
- Send or receive media files:
- Media files are compressed and encrypted on the device side.
- The compressed and encrypted file is sent to the asset service, which stores it on blob storage and assigns an ID to the file.
- The asset service maintains a hash for each file to avoid duplication. If a file already exists in the blob storage, the same ID is reused instead of uploading the file again.
- The asset service sends the media file ID to the receiver via the message service.
- The receiver uses the ID to download the media file from the blob storage.
- Content is loaded onto a CDN if the asset service receives a high number of requests for specific content.
- Support for group messages:
-
User A Sends a Message to the WebSocket Server:
- User A is connected to a WebSocket server, which maintains an active connection for the user.
- When User A sends a message intended for Group/A, the WebSocket server forwards it to the message service.
- The message service handles the initial processing and routing of the message.
-
Message Service Sends the Message to Kafka:
- The message service packages the message with metadata such as group ID (Group/A), sender ID, and timestamp.
- The service publishes the message to Kafka for further processing.
-
Kafka's Responsibility (Message Broker):
-
Topic Management: Kafka treats each group (e.g., Group/A) as a "topic." Topics are logical channels for organizing messages, where producers (senders) write messages and consumers (receivers) read them.
-
Message Storage: Kafka temporarily stores the message in a partitioned log associated with the Group/A topic, ensuring durability even during failures or delays.
-
Scalability: Kafka partitions topics for parallel processing, enabling the handling of millions of messages efficiently.
-
Producer and Consumer Decoupling: Kafka decouples senders (producers) and receivers (consumers), simplifying the architecture and allowing dynamic scaling of recipients.
-
Consumer Coordination: Kafka ensures that all intended recipients receive messages in the correct order, coordinating delivery between producers and consumers.
-
Group Service Retrieves Group Metadata:
- The group service maintains metadata for groups, including:
- User IDs in the group.
- Group ID and status (active/inactive).
- Group attributes like the group icon and number of members.
- Metadata is stored in a MySQL database cluster with:
- Secondary Replicas: Geographically distributed replicas ensure high availability and reduce latency for read operations.
- Redis Cache: Frequently accessed group data is cached to improve response times and reduce database load.
-
Group Message Handler Fetches Group Data:
- The group message handler communicates with the group service to retrieve Group/A user data and statuses (online/offline).
- It retrieves the message for Group/A from Kafka and processes it.
-
Message Delivery to Group Members:
- The group message handler acts as the consumer for Kafka messages.
- For each user in Group/A:
- If the user is online, the handler forwards the message to the WebSocket server they are connected to for real-time delivery.
- If the user is offline, the message may be stored temporarily or forwarded to a push notification system.
Non-functional Requirements
-
Minimizing Latency
- Geographically distributed cache management systems and servers
- CDNs (Content Delivery Networks)
-
Consistency
- Provide unique IDs to messages using Sequencer or other mechanisms
- Use FIFO messaging queue with strict ordering
-
Availability
- Provide multiple WebSocket servers and managers to establish connections between users
- Replication of messages and data associated with users and groups on different servers
- Follow disaster recovery protocols
-
Security
- Via end-to-end encryption
-
Scalability
- Performance tuning of servers
- Horizontal scalability of services