Expected topics
- Web Crawler
- Crawler Pattern
- worker
- content
- FIFO queue
- crawler trap
- politeness
- domain
- Bloom filter
- false positive
Self-check prompts
- What users, scale, latency, availability, and consistency requirements should you clarify for Web Crawler?
- What are the main APIs, data model, and request flow?
- Where is the main bottleneck around Crawler Pattern, worker, content, and how would you scale it?
- What failure mode matters most, and how do retry, recovery, and idempotency work?
- Which trade-off would you choose, what do you lose, and when would you change that decision?
Common mistakes
- Jumping into vendor names before clarifying requirements and scale.
- Listing components without explaining the end-to-end request flow.
- Leaving the bottleneck vague instead of quantifying capacity, partitioning, and recovery behavior.
- Mentioning trade-offs without choosing an option and explaining the condition that would change the decision.