CDN Architecture Explained: How Multi-Level Caching Works

D

David Wang

In the world of content delivery, speed is everything. But how do Content Delivery Networks (CDNs) actually achieve such high speeds? The secret lies in their caching architecture.

The Three-Level Cache Strategy

Our CDN Engine employs a sophisticated three-level caching mechanism to ensure optimal performance for both hot and cold data.

  • L1 (Memory): Ultra-fast access for the most popular content.
  • L2 (SSD): High-speed storage for warm data.
  • L3 (HDD): Massive capacity for long-tail content.

Request Coalescing (SingleFlight)

In high-concurrency scenarios, a single popular file (like a viral video or breaking news image) can generate thousands of concurrent requests instantly. Without protection, these requests would penetrate the cache layer and overwhelm the origin server, a phenomenon known as "cache stampede".

CDN Engine introduces the SingleFlight mechanism. When multiple requests access the same uncached resource simultaneously, the system coalesces them into a single origin request. Once the first request retrieves the data and writes it to the cache, all waiting requests are served from the cache simultaneously. This can reduce thousands of origin requests to just one, greatly protecting the origin server.

Slice-Based Caching Technology

For GB-level large files (like installation packages or HD videos), traditional whole-file caching is inefficient and consumes excessive disk space. We adopt slice-based caching technology, splitting large files into fixed-size chunks (e.g., 1MB or 4MB).

  • On-demand Origin Fetch: Only load required segments when users jump through a video.
  • Parallel Downloading: Edge nodes can request different segments from the origin in parallel, accelerating fetch speeds.
  • Storage Optimization: Evict infrequently accessed segments rather than the entire large file.

This combination of tiering and slicing allows us to balance cost and performance effectively, ensuring your users always get the fastest possible response times while maximizing storage efficiency.