Module 1 β€” Data Storage

Object Storage & CDN

Store files (images, videos, documents) and deliver them globally with minimal latency.

1The Warehouse & Delivery Analogy

πŸ’‘ Simple Analogy
Object Storage (S3): A massive warehouse storing all your products. Cheap, unlimited space, but warehouse is in one locationβ€”far from most customers.

CDN: Small warehouses (edge locations) all over the world with copies of popular products. Customer in Tokyo gets item from Tokyo warehouse, not US headquarters.

Together: Central storage + distributed delivery = fast access everywhere.

2Object Storage

Object Storage stores data as objects (files) with unique keys. Unlike file systems with folders, it's a flat structure where each object has a key (path), data (content), and metadata.

Key Characteristics

Unlimited Scale
Store petabytes of data. No capacity planning needed.
Durability
99.999999999% (11 nines) durability. Data replicated across locations.
Cheap
~$0.023/GB/month for standard storage. Much cheaper than databases.
HTTP Access
Objects accessed via REST API. Easy integration.

Common Object Storage Services

ServiceProviderFeatures
Amazon S3AWSIndustry standard, most integrations
Cloud StorageGoogle CloudGreat ML/BigQuery integration
Azure BlobMicrosoftEnterprise, Office 365 integration
R2CloudflareZero egress fees, S3 compatible

3Storage Classes

Different storage classes for different access patterns. Trade cost for latency.

Standard
Instant
$0.023/GB
Frequently accessed files
Infrequent Access
Instant
$0.0125/GB
Monthly backups, disaster recovery
Glacier
Minutes-hours
$0.004/GB
Archives, compliance data
Deep Archive
12-48 hours
$0.00099/GB
Long-term archives, rarely accessed

4CDN (Content Delivery Network)

A CDN caches content at edge locations worldwide. Users fetch from nearest edge server instead of origin, reducing latency from 200ms+ to <50ms.

How CDN Works

User
Tokyo
β†’
Edge
5ms
Cache HIT?
↓ Yes: Return
↓ No: Fetch origin
β†’
Origin
US-East

Popular CDN Providers

CloudFlare
300+ locations
Free tier, DDoS protection, Workers
AWS CloudFront
400+ locations
S3 integration, Lambda@Edge
Fastly
80+ locations
Real-time purging, edge compute
Akamai
4000+ locations
Enterprise, largest network

5Object Storage + CDN Architecture

Typical Setup for Images/Videos

1
Upload: User uploads image β†’ App server β†’ S3 bucket
2
Process: Trigger Lambda β†’ Resize/compress β†’ Store variants
3
Serve: CDN URL β†’ Edge cache β†’ Origin (S3) on miss
4
Invalidate: On update, purge CDN cache for affected paths
URL Strategy
Use CDN URLs in your app: https://cdn.example.com/images/user-123-v2.jpg
Include version in filename for easy cache busting.

6When to Use What

Object Storage (S3)

  • β€’ User uploads (images, videos, documents)
  • β€’ Backup and archive data
  • β€’ Static website hosting
  • β€’ Data lake for analytics
  • β€’ Log storage

CDN

  • β€’ Static assets (CSS, JS, images)
  • β€’ Video streaming
  • β€’ API response caching
  • β€’ Global low-latency delivery
  • β€’ DDoS protection

7Key Takeaways

1Object Storage (S3): Cheap, durable, unlimited storage for files.
2CDN caches content at edge locations for global low latency.
3Use storage classes wisely: Standard for hot data, Glacier for archives.
4Combine S3 + CDN: S3 as origin, CDN for delivery.
5Cache invalidation: Version files (v2.jpg) or purge CDN cache.
6In interviews: Discuss upload flow, processing pipeline, and delivery strategy.