Module 1 β Data Storage
Object Storage & CDN
Store files (images, videos, documents) and deliver them globally with minimal latency.
1The Warehouse & Delivery Analogy
π‘ Simple Analogy
Object Storage (S3): A massive warehouse storing all your products. Cheap, unlimited space, but warehouse is in one locationβfar from most customers.
CDN: Small warehouses (edge locations) all over the world with copies of popular products. Customer in Tokyo gets item from Tokyo warehouse, not US headquarters.
Together: Central storage + distributed delivery = fast access everywhere.
CDN: Small warehouses (edge locations) all over the world with copies of popular products. Customer in Tokyo gets item from Tokyo warehouse, not US headquarters.
Together: Central storage + distributed delivery = fast access everywhere.
2Object Storage
Object Storage stores data as objects (files) with unique keys. Unlike file systems with folders, it's a flat structure where each object has a key (path), data (content), and metadata.
Key Characteristics
Unlimited Scale
Store petabytes of data. No capacity planning needed.
Durability
99.999999999% (11 nines) durability. Data replicated across locations.
Cheap
~$0.023/GB/month for standard storage. Much cheaper than databases.
HTTP Access
Objects accessed via REST API. Easy integration.
Common Object Storage Services
| Service | Provider | Features |
|---|---|---|
| Amazon S3 | AWS | Industry standard, most integrations |
| Cloud Storage | Google Cloud | Great ML/BigQuery integration |
| Azure Blob | Microsoft | Enterprise, Office 365 integration |
| R2 | Cloudflare | Zero egress fees, S3 compatible |
3Storage Classes
Different storage classes for different access patterns. Trade cost for latency.
Standard
Instant
$0.023/GB
Frequently accessed files
Infrequent Access
Instant
$0.0125/GB
Monthly backups, disaster recovery
Glacier
Minutes-hours
$0.004/GB
Archives, compliance data
Deep Archive
12-48 hours
$0.00099/GB
Long-term archives, rarely accessed
4CDN (Content Delivery Network)
A CDN caches content at edge locations worldwide. Users fetch from nearest edge server instead of origin, reducing latency from 200ms+ to <50ms.
How CDN Works
User
TokyoEdge
5msCache HIT?
β Yes: Return
β No: Fetch origin
Origin
US-EastPopular CDN Providers
CloudFlare
300+ locations
Free tier, DDoS protection, Workers
AWS CloudFront
400+ locations
S3 integration, Lambda@Edge
Fastly
80+ locations
Real-time purging, edge compute
Akamai
4000+ locations
Enterprise, largest network
5Object Storage + CDN Architecture
Typical Setup for Images/Videos
1
Upload: User uploads image β App server β S3 bucket
2
Process: Trigger Lambda β Resize/compress β Store variants
3
Serve: CDN URL β Edge cache β Origin (S3) on miss
4
Invalidate: On update, purge CDN cache for affected paths
URL Strategy
Use CDN URLs in your app:
Include version in filename for easy cache busting.
https://cdn.example.com/images/user-123-v2.jpgInclude version in filename for easy cache busting.
6When to Use What
Object Storage (S3)
- β’ User uploads (images, videos, documents)
- β’ Backup and archive data
- β’ Static website hosting
- β’ Data lake for analytics
- β’ Log storage
CDN
- β’ Static assets (CSS, JS, images)
- β’ Video streaming
- β’ API response caching
- β’ Global low-latency delivery
- β’ DDoS protection
7Key Takeaways
1Object Storage (S3): Cheap, durable, unlimited storage for files.
2CDN caches content at edge locations for global low latency.
3Use storage classes wisely: Standard for hot data, Glacier for archives.
4Combine S3 + CDN: S3 as origin, CDN for delivery.
5Cache invalidation: Version files (v2.jpg) or purge CDN cache.
6In interviews: Discuss upload flow, processing pipeline, and delivery strategy.