Photo by Terry Vlisidis
A simple twitter design which addresses the system design key concepts.
- Requirements and goals
- Network capacity and Storage
- System APIS
- High level design
- Database Schema
- Data Shading
Requirements and goals
- Post tweet
- Favorite tweet
- Follow user
- Highly available
- Low latency - fast loading
Storage and network capacity
let's say we have:
- 1 billion users
- 200M Daily active users
- 100M tweets/day
We can allow users to tweet a text of maximum length 200 chars.
Doing the math, 100M x (2bytes( 1 char) x 200 + 40bytes meta info) = 44 Billion bytes(~44GB) per day
This means in a year we would need ~(44 x 365) GB = ~16TB data storage
let tweet_size = 440 bytes
If we get about 10 Billion tweet views/day, per second we would need to transfer data at 10B * tweet_size / (24 x 60 x 60) sec = ~50Mbs/s network capacity
We can use either SOAP or REST to expose our services. We are going to use REST because it offers flexible implementation, it is lightweight and has a very low learning curve.
POST: tweet(key, tweet_data, location) return url of the new tweet
GET: tweet(key, tweet_id) return info about the tweet in JSON
The key allows us to know who is accessing our services
High level design
We can have a client > load balancer > server cluster > database/file storage
tweet: id userId lat lon createdAt numOfFavorites
user: id name email dob lastLogin
userFollow: follower following
We can split data between different machines
When App server sends request, we can use a balancer or hash function to know where to get the requested data
server > balancer / hash function > A-N storage or O-Z storage
We can use LRU(least recently used) cache to handle caching.
With this we can check if the requested tweet is in the cache before we query the database.