System Design Glossary
A comprehensive reference of essential system design terms and concepts. Bookmark this page for quick lookups during your preparation.
API (Application Programming Interface)
CommunicationA set of protocols and tools that allows different software applications to communicate with each other. REST, GraphQL, and gRPC are common API styles.
Availability
ReliabilityThe percentage of time a system is operational and accessible. Measured in "nines" (e.g., 99.99% = "four nines" = 52 minutes downtime/year).
Asynchronous
CommunicationA communication pattern where the sender doesn't wait for a response before continuing. Enables better scalability and decoupling.
Blob Storage
StorageBinary Large Object storage for unstructured data like images, videos, and files. Examples: AWS S3, Azure Blob Storage, Google Cloud Storage.
Broker
CommunicationAn intermediary that receives messages from producers and delivers them to consumers. Examples: Kafka, RabbitMQ, Redis Pub/Sub.
Cache
PerformanceA high-speed data storage layer that stores a subset of data for faster access. Reduces database load and improves response times.
CAP Theorem
Distributed SystemsStates that a distributed system can only guarantee 2 of 3 properties: Consistency, Availability, and Partition Tolerance.
CDN (Content Delivery Network)
PerformanceA geographically distributed network of servers that delivers content from locations closest to users, reducing latency.
Cluster
InfrastructureA group of servers working together as a single system to provide high availability and load distribution.
Concurrency
PerformanceThe ability to handle multiple tasks or requests simultaneously. Different from parallelism (executing tasks at the exact same time).
Consistency
Distributed SystemsAll nodes in a distributed system see the same data at the same time. Strong consistency ensures reads always return the latest write.
Container
InfrastructureA lightweight, standalone package that includes everything needed to run an application. Docker is the most popular containerization platform.
Database Index
StorageA data structure that improves the speed of data retrieval operations on a database table at the cost of additional storage and write overhead.
Dead Letter Queue (DLQ)
CommunicationA queue that stores messages that couldn't be processed successfully after multiple retries. Used for debugging and recovery.
Denormalization
StorageThe process of adding redundant data to a database to improve read performance at the cost of write complexity.
DNS (Domain Name System)
InfrastructureThe internet's phone book that translates human-readable domain names (google.com) to IP addresses.
Eventual Consistency
Distributed SystemsA consistency model where replicas will eventually converge to the same state, but not immediately after a write.
Event-Driven Architecture
ArchitectureA design pattern where the flow of the program is determined by events (messages) rather than sequential logic.
Failover
ReliabilityThe automatic switching to a backup system when the primary system fails. Critical for high availability.
Fan-out
CommunicationA messaging pattern where a single message is delivered to multiple consumers. Used in notification systems and social media feeds.
gRPC
CommunicationA high-performance RPC (Remote Procedure Call) framework that uses Protocol Buffers for serialization. Faster than REST for service-to-service communication.
Heartbeat
ReliabilityPeriodic signals sent between systems to indicate they are operational. Used for health monitoring and failure detection.
Horizontal Scaling
ScalabilityAdding more machines to handle increased load (scaling out). Contrast with vertical scaling (adding resources to existing machine).
Hot Spot
PerformanceA condition where a disproportionate amount of traffic or load is directed to a single node or partition.
Idempotency
ReliabilityA property where an operation produces the same result regardless of how many times it's executed. Critical for retry logic.
Latency
PerformanceThe time it takes for a request to travel from sender to receiver and back. Measured in milliseconds (ms).
Load Balancer
InfrastructureA component that distributes incoming network traffic across multiple servers to ensure no single server is overwhelmed.
Long Polling
CommunicationA technique where the client makes a request and the server holds it open until new data is available, simulating real-time updates.
Message Queue
CommunicationA form of asynchronous communication where messages are stored in a queue until they can be processed. Decouples producers from consumers.
Microservices
ArchitectureAn architectural style where an application is composed of small, independent services that communicate via APIs.
Monolith
ArchitectureAn architectural style where all components of an application are tightly coupled into a single deployable unit.
NoSQL
StorageA category of databases that don't use the traditional relational model. Types include document (MongoDB), key-value (Redis), column (Cassandra), and graph (Neo4j).
Partition
ScalabilityA division of a database or message queue topic. Also known as sharding. Enables horizontal scaling.
Partition Tolerance
Distributed SystemsA system's ability to continue operating despite network partitions (communication failures between nodes).
Primary-Replica
ScalabilityA replication pattern where one node (primary) handles writes and replicates data to read-only replicas. Also called master-slave.
Pub/Sub
CommunicationA messaging pattern where publishers send messages to topics and subscribers receive messages from topics they're interested in.
QPS (Queries Per Second)
PerformanceA metric measuring the number of queries a system can handle per second. Used for capacity planning.
Quorum
Distributed SystemsThe minimum number of nodes that must agree on an operation for it to be considered successful. Common formula: (N/2) + 1.
Rate Limiting
ReliabilityControlling the number of requests a user or client can make in a given time period. Prevents abuse and ensures fair usage.
Redundancy
ReliabilityDuplication of critical components to increase reliability. If one fails, others can take over.
Replication
ScalabilityCopying data across multiple nodes to improve availability and read performance. Synchronous or asynchronous.
REST
CommunicationRepresentational State Transfer. An architectural style for APIs that uses HTTP methods (GET, POST, PUT, DELETE) to perform CRUD operations.
Sharding
ScalabilitySplitting data across multiple databases based on a shard key. Each shard contains a subset of the data.
SLA (Service Level Agreement)
ReliabilityA contract defining the expected level of service, including availability, response time, and support.
Stateless
ArchitectureA service that doesn't store client session data between requests. Each request contains all information needed to process it.
Synchronous
CommunicationA communication pattern where the sender waits for a response before continuing. Simpler but can create bottlenecks.
Throughput
PerformanceThe number of operations or data transferred per unit time. Measured in requests/second or bytes/second.
TTL (Time To Live)
PerformanceA mechanism that limits the lifespan of data. Commonly used in caching and DNS to expire stale data.
Vertical Scaling
ScalabilityAdding more resources (CPU, RAM, storage) to an existing machine (scaling up). Has hardware limits.
WebSocket
CommunicationA protocol providing full-duplex communication channels over a single TCP connection. Used for real-time applications.
Write-Ahead Log (WAL)
StorageA technique where changes are written to a log before being applied to the database. Ensures durability and crash recovery.
Interview Follow-up Questions
Interview Follow-up Questions
Common follow-up questions interviewers ask
Test Your Knowledge
Test Your Understanding
5 questions
Which statement best describes the CAP theorem?
What is the primary purpose of a load balancer?
In the context of databases, what does ACID stand for?
What is the difference between a CDN and a cache?
What does 'idempotency' mean in the context of APIs?