Elasticsearch
Full-text search, log analytics, and real-time insights at scale.
1The Library Index Analogy
Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It's designed for full-text search, log analytics, and real-time data exploration.
2Core Concepts
Index
Collection of documents (like a database table). Has a schema/mapping.
logs-2024-01, products, usersDocument
JSON object stored in an index (like a table row).
{"title": "iPhone 15", "price": 999}Shard
Horizontal partition of an index. Enables parallel processing.
5 primary shards across 5 nodesReplica
Copy of a shard for redundancy and read scaling.
1 replica = 2 copies of each shard3Inverted Index
How Full-Text Search Works
Documents
Doc 1: "The quick brown fox"
Doc 2: "The quick rabbit"
Doc 3: "The brown dog"
Inverted Index
"quick" → [Doc 1, Doc 2]
"brown" → [Doc 1, Doc 3]
"fox" → [Doc 1]
Inverted index maps each word to the documents containing it. Search "quick brown" = intersection of [1,2] and [1,3] = [1]. O(1) lookup!
4Query Types
Match
Full-text search with relevance scoring
{"match": {"title": "quick fox"}}Term
Exact match (not analyzed)
{"term": {"status": "published"}}Range
Numeric or date ranges
{"range": {"price": {"gte": 100, "lte": 500}}}Bool
Combine queries (must, should, must_not)
{"bool": {"must": [...], "filter": [...]}}Aggregation
Analytics: count, avg, histogram, terms
{"aggs": {"by_category": {"terms": {"field": "category"}}}}5Use Cases
Full-Text Search
Product search, site search, document search
Amazon, Wikipedia, GitHub
Log Analytics
Centralized logging with ELK stack (Elasticsearch, Logstash, Kibana)
DevOps, security, debugging
Application Monitoring
APM, metrics, traces with Elastic APM
Distributed tracing
Security Analytics
SIEM, threat detection, anomaly detection
Elastic Security
6Scaling Considerations
Shard Sizing
Aim for 10-50GB per shard. Too many small shards = overhead. Too few large = slow queries.
Index Lifecycle
Use ILM to rotate indices: hot → warm → cold → delete. Essential for logs.
Memory
ES is memory-hungry. JVM heap = 50% of RAM (max 32GB). Rest for OS cache.
Mapping
Define mappings explicitly. Dynamic mapping can cause issues at scale.
7Key Takeaways
?Quiz
1. E-commerce site needs product search with typo tolerance. Best choice?
2. What makes Elasticsearch search fast?