Patterns
- Unique ID Generation
- Read-Heavy Caching
Expected topics
- Short code generation: counter + base62 vs hash, and collision handling
- Custom alias reservation and uniqueness enforcement
- Redirect path latency budget and cache-first lookup
- 301 vs 302 redirect choice and its impact on analytics and caching
- Read/write ratio estimation and storage sizing for billions of links
- Database schema, indexing, and partitioning by short code
- Hot key handling for viral links (cache, CDN/edge, replication)
- Click analytics ingestion without slowing the redirect path
- Link expiration, TTL cleanup, and abuse/malicious URL controls
- Availability vs consistency trade-off for reads after write
Self-check prompts
- What requirements should you clarify first: DAU, read/write ratio, latency target for redirects, custom alias support, analytics needs, and link lifetime?
- How do you generate collision-free short codes at scale, and why would you pick counter + base62 over hashing (or vice versa)?
- What is the end-to-end redirect flow, and how do cache, database, and CDN/edge each keep p99 latency low on the read path?
- How do you record click analytics for 100M DAU without adding latency to redirects, and what happens if the analytics pipeline falls behind?
- Which consistency trade-off do you accept (e.g. newly created link briefly unresolvable on a replica), and what condition would change that decision?
Common mistakes
- Skipping the read/write ratio estimate — the whole design hinges on redirects being ~100-1000x more frequent than link creation.
- Hashing the long URL and ignoring collisions, or re-hashing in a loop without explaining uniqueness guarantees.
- Putting analytics writes synchronously on the redirect path instead of emitting events to a queue/stream.
- Choosing 301 vs 302 arbitrarily without connecting it to browser caching and click-tracking requirements.
- Treating the database as the redirect hot path instead of designing cache-first with a clear miss/invalidation story.