CHAPTER 10: DESIGN A NOTIFICATION SYSTEM
Step 1 - Understand the problem and establish design scope
Candidate: What types of notifications does the system support?
Interviewer: Push notification, SMS message, and email.
Candidate: Is it a real-time system?
Interviewer: Let us say it is a soft real-time system. We want a user to receive notifications as soon as possible. However, if the system is under a high workload, a slight delay is acceptable.
Candidate: What are the supported devices?
Interviewer: iOS devices, android devices, and laptop/desktop.
Candidate: What triggers notifications?
Interviewer: Notifications can be triggered by client applications. They can also be
scheduled on the server-side.
Candidate: Will users be able to opt-out?
Interviewer: Yes, users who choose to opt-out will no longer receive notifications.
Candidate: How many notifications are sent out each day?
Interviewer: 10 million mobile push notifications, 1 million SMS messages, and 5 million emails.
Step 2 - Propose high-level design and get buy-in
• Different types of notifications
• Contact info gathering flow
• Notification sending/receiving flow
Different types of notifications
Android push notification
SMS message
Email
Contact info gathering flow
Notification sending/receiving flow
Three problems are identified in this design:
• Single point of failure (SPOF): A single notification server means SPOF.
• Hard to scale: The notification system handles everything related to push notifications in one server. It is challenging to scale databases, caches, and different notification
processing components independently.
• Performance bottleneck: Processing and sending notifications can be resource intensive.
For example, constructing HTML pages and waiting for responses from third party
services could take time. Handling everything in one system can result in the system
overload, especially during peak hours.
High-level design (improved)
• Move the database and cache out of the notification server.
• Add more notification servers and set up automatic horizontal scaling.
• Introduce message queues to decouple the system components.
Step 3 - Design deep dive
Reliability
How to prevent data loss?
persists notification data in a database and implements a retry mechanism.
Will recipients receive a notification exactly once?
Although notification is delivered exactly once most of the time, the distributed nature could result in duplicate notifications.
we introduce a dedupe mechanism and handle each failure case carefully.
When a notification event first arrives, we check if it is seen before by checking the event ID. If it is seen before, it is discarded.
Additional components and considerations
Notification template
Notification setting
Before any notification is sent to a user, we first check if a user is opted-in to receive this type of notification.
Rate limiting
Retry mechanism
Security in push notifications
Monitor queued notifications
A key metric to monitor is the total number of queued notifications. If the number is large, the notification events are not processed fast enough by workers. To avoid delay in the notification delivery, more workers are needed.
Events tracking
Updated design
Step 4 - Wrap up
Besides the high-level design, we dug deep into more components and optimizations.
• Reliability: We proposed a robust retry mechanism to minimize the failure rate.
• Security: AppKey/appSecret pair is used to ensure only verified clients can send
notifications.
• Tracking and monitoring: These are implemented in any stage of a notification flow to
capture important stats.
• Respect user settings: Users may opt-out of receiving notifications. Our system checks
user settings first before sending notifications.
• Rate limiting: Users will appreciate a frequency capping on the number of notifications
they receive.