URL Shortener

Design a scalable URL shortener for 100M daily active users.

Practice this with AI →

Patterns

  • Unique ID Generation
  • Read-Heavy Caching

Expected topics

  • Short code generation: counter + base62 vs hash, and collision handling
  • Custom alias reservation and uniqueness enforcement
  • Redirect path latency budget and cache-first lookup
  • 301 vs 302 redirect choice and its impact on analytics and caching
  • Read/write ratio estimation and storage sizing for billions of links
  • Database schema, indexing, and partitioning by short code
  • Hot key handling for viral links (cache, CDN/edge, replication)
  • Click analytics ingestion without slowing the redirect path
  • Link expiration, TTL cleanup, and abuse/malicious URL controls
  • Availability vs consistency trade-off for reads after write

Self-check prompts

  • What requirements should you clarify first: DAU, read/write ratio, latency target for redirects, custom alias support, analytics needs, and link lifetime?
  • How do you generate collision-free short codes at scale, and why would you pick counter + base62 over hashing (or vice versa)?
  • What is the end-to-end redirect flow, and how do cache, database, and CDN/edge each keep p99 latency low on the read path?
  • How do you record click analytics for 100M DAU without adding latency to redirects, and what happens if the analytics pipeline falls behind?
  • Which consistency trade-off do you accept (e.g. newly created link briefly unresolvable on a replica), and what condition would change that decision?

Common mistakes

  • Skipping the read/write ratio estimate — the whole design hinges on redirects being ~100-1000x more frequent than link creation.
  • Hashing the long URL and ignoring collisions, or re-hashing in a loop without explaining uniqueness guarantees.
  • Putting analytics writes synchronously on the redirect path instead of emitting events to a queue/stream.
  • Choosing 301 vs 302 arbitrarily without connecting it to browser caching and click-tracking requirements.
  • Treating the database as the redirect hot path instead of designing cache-first with a clear miss/invalidation story.