Graph Databases
When relationships are first-class citizens-social networks, recommendations, fraud detection.
1The Social Network Analogy
Graph database stores data as nodes (entities) and edges (relationships). Optimized for traversing connections, making relationship queries fast regardless of data size.
2Core Concepts
Nodes (Vertices)
Entities in your data: User, Product, Post, Location
(:User {name: 'Alice', age: 30})Edges (Relationships)
Connections between nodes. Have a type and direction.
-[:FOLLOWS]→, -[:PURCHASED]→, -[:FRIENDS_WITH]-Properties
Key-value attributes on nodes and edges
FOLLOWS {since: '2023-01-15'}Labels
Categories for nodes. A node can have multiple labels.
(:User:Admin), (:Product:Electronics)3Graph vs Relational
| Query | Relational (SQL) | Graph (Cypher) |
|---|---|---|
| Friends of friends | 3+ JOINs, slow | Simple traversal |
| Shortest path | Recursive CTEs, complex | Built-in function |
| Pattern matching | Multiple JOINs | Visual pattern syntax |
| Aggregate analytics | Optimized | Less optimized |
Rule of thumb: If your queries are about relationships and traversals, use a graph DB. If they're about aggregations and reporting, use relational.
4Cypher Query Language
Neo4j Cypher Examples
// Find all of Alice's friends
MATCH (a:User {name: 'Alice'})-[:FRIENDS_WITH]->(friend)
RETURN friend.name
// Friends of friends (2 hops)
MATCH (a:User {name: 'Alice'})-[:FRIENDS_WITH*2]->(fof)
RETURN DISTINCT fof.name
// Shortest path between two users
MATCH path = shortestPath(
(a:User {name: 'Alice'})-[*]-(b:User {name: 'Bob'})
)
RETURN path
// Recommend products bought by similar users
MATCH (u:User {name: 'Alice'})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar)
MATCH (similar)-[:PURCHASED]->(rec:Product)
WHERE NOT (u)-[:PURCHASED]->(rec)
RETURN rec.name, COUNT(*) as score ORDER BY score DESC5Use Cases
Social Networks
Friends, followers, mutual connections, feed ranking
Facebook, LinkedIn, Twitter
Recommendation Engines
Users who bought X also bought Y. Similar content.
Netflix, Amazon, Spotify
Fraud Detection
Find suspicious patterns: circular transactions, identity links
Banks, payment processors
Knowledge Graphs
Connect entities with semantic relationships
Google Knowledge Graph, Wikipedia
Network Infrastructure
Routers, switches, dependencies, impact analysis
Telecom, cloud providers
Access Control
Who has access to what through which groups/roles
Enterprise permissions
6Popular Graph Databases
Neo4j
Most popular. Cypher query language. ACID compliant.
Amazon Neptune
AWS managed. Supports Gremlin and SPARQL.
JanusGraph
Open source, distributed. Pluggable storage (Cassandra, HBase).
Dgraph
Distributed, native GraphQL support.
7Key Takeaways
?Quiz
1. 'Find all users within 3 connections of Alice'. Best DB?
2. Graph DBs store relationships as: