Module 8 - Networking and APIs

Protocol Buffers & Serialization

How data is encoded for transmission-JSON vs binary formats.

1The Packaging Analogy

Simple Analogy
Sending data is like shipping furniture. JSON is like shipping fully assembled-easy to inspect but bulky. Protocol Buffers is like IKEA flat-pack-compact, efficient, but you need instructions (schema) to reassemble.

Serialization converts in-memory objects to bytes for transmission or storage. The format you choose affects speed, size, and compatibility.

2Common Formats

JSON

Text100%
Pros
  • Human readable
  • Universal support
  • Self-describing
Cons
  • Verbose (field names repeated)
  • Slower parsing
  • No schema enforcement

Protocol Buffers

Binary30-50%
Pros
  • Compact (2-10x smaller)
  • Fast serialization
  • Strongly typed schema
Cons
  • Not human readable
  • Requires .proto files
  • Schema evolution complexity

MessagePack

Binary50-70%
Pros
  • JSON-compatible
  • Smaller than JSON
  • No schema needed
Cons
  • Still has field names
  • Less compact than Protobuf

Avro

Binary40-60%
Pros
  • Schema evolution
  • Compact
  • Good for big data
Cons
  • Schema required at read
  • Less common

3JSON vs Protobuf Example

JSON (95 bytes)
{
  "id": 12345,
  "name": "John Doe",
  "email": "john@example.com",
  "age": 30,
  "active": true
}
Protobuf (~35 bytes)
// Schema (.proto file)
message User {
  int32 id = 1;
  string name = 2;
  string email = 3;
  int32 age = 4;
  bool active = 5;
}

// Binary: field numbers + values
// No field names transmitted!

3x smaller because Protobuf uses field numbers (1 byte) instead of field names ("email" = 5 bytes).

4Schema Evolution

Schemas change over time. How do you handle old clients reading new data?

Add Field

Yes - old clients ignore new fields. New clients get default if missing.

Use new field numbers. Never reuse deleted field numbers.

Remove Field

Yes - mark as 'reserved'. Old data still has it, new code ignores.

reserved 3; // Don't reuse this number

Rename Field

Yes - field number unchanged. Name is documentation only.

Change name in .proto, field number stays same.

Change Type

Dangerous - int32 to string breaks compatibility.

Avoid. Create new field instead.

5Performance Comparison

FormatSizeEncode SpeedDecode Speed
JSON100%1x1x
Protobuf30-50%3-10x3-10x
MessagePack50-70%2-3x2-3x
Avro40-60%2-5x2-5x

6When to Use What

JSON

Public APIs, web apps, debugging, config files

Universal support, human readable, easy debugging

Protobuf

gRPC, internal microservices, high-throughput systems

Smallest size, fastest speed, type safety

MessagePack

Redis caching, when you want smaller JSON

Drop-in JSON replacement, no schema needed

Avro

Kafka, big data pipelines, data lakes

Schema evolution, writer schema embedded

7Key Takeaways

1JSON for human-readable, public APIs. Universal but verbose.
2Protobuf for internal services. 3-10x smaller and faster.
3Field numbers (not names) enable schema evolution
4Never reuse deleted field numbers. Mark as reserved.
5Choose based on: speed vs debuggability tradeoff

?Quiz

1. High-throughput internal service needs smallest payload. Best choice?

2. You deleted field #5. What should you do?