Module 8 - Networking and APIs

Protocol Buffers & Serialization

How data is encoded for transmission-JSON vs binary formats.

1The Packaging Analogy

Simple Analogy

Sending data is like shipping furniture. JSON is like shipping fully assembled-easy to inspect but bulky. Protocol Buffers is like IKEA flat-pack-compact, efficient, but you need instructions (schema) to reassemble.

Serialization converts in-memory objects to bytes for transmission or storage. The format you choose affects speed, size, and compatibility.

2Common Formats

JSON

Text • 100%

Pros

✓ Human readable
✓ Universal support
✓ Self-describing

Cons

✗ Verbose (field names repeated)
✗ Slower parsing
✗ No schema enforcement

Protocol Buffers

Binary • 30-50%

Pros

✓ Compact (2-10x smaller)
✓ Fast serialization
✓ Strongly typed schema

Cons

✗ Not human readable
✗ Requires .proto files
✗ Schema evolution complexity

MessagePack

Binary • 50-70%

Pros

✓ JSON-compatible
✓ Smaller than JSON
✓ No schema needed

Cons

✗ Still has field names
✗ Less compact than Protobuf

Avro

Binary • 40-60%

Pros

✓ Schema evolution
✓ Compact
✓ Good for big data

Cons

✗ Schema required at read
✗ Less common

3JSON vs Protobuf Example

JSON (95 bytes)

{
  "id": 12345,
  "name": "John Doe",
  "email": "john@example.com",
  "age": 30,
  "active": true
}

Protobuf (~35 bytes)

// Schema (.proto file)
message User {
  int32 id = 1;
  string name = 2;
  string email = 3;
  int32 age = 4;
  bool active = 5;
}

// Binary: field numbers + values
// No field names transmitted!

3x smaller because Protobuf uses field numbers (1 byte) instead of field names ("email" = 5 bytes).

4Schema Evolution

Schemas change over time. How do you handle old clients reading new data?

Add Field

Yes - old clients ignore new fields. New clients get default if missing.

Use new field numbers. Never reuse deleted field numbers.

Remove Field

Yes - mark as 'reserved'. Old data still has it, new code ignores.

reserved 3; // Don't reuse this number

Rename Field

Yes - field number unchanged. Name is documentation only.

Change name in .proto, field number stays same.

Change Type

Dangerous - int32 to string breaks compatibility.

Avoid. Create new field instead.

5Performance Comparison

Format	Size	Encode Speed	Decode Speed
JSON	100%	1x	1x
Protobuf	30-50%	3-10x	3-10x
MessagePack	50-70%	2-3x	2-3x
Avro	40-60%	2-5x	2-5x

6When to Use What

JSON

Public APIs, web apps, debugging, config files

Universal support, human readable, easy debugging

Protobuf

gRPC, internal microservices, high-throughput systems

Smallest size, fastest speed, type safety

MessagePack

Redis caching, when you want smaller JSON

Drop-in JSON replacement, no schema needed

Avro

Kafka, big data pipelines, data lakes

Schema evolution, writer schema embedded

7Key Takeaways

1JSON for human-readable, public APIs. Universal but verbose.

2Protobuf for internal services. 3-10x smaller and faster.

3Field numbers (not names) enable schema evolution

4Never reuse deleted field numbers. Mark as reserved.

5Choose based on: speed vs debuggability tradeoff

?Quiz

1. High-throughput internal service needs smallest payload. Best choice?

2. You deleted field #5. What should you do?