Module 10 - Specialized Databases

Graph Databases

When relationships are first-class citizens-social networks, recommendations, fraud detection.

1The Social Network Analogy

Simple Analogy
In a relational DB, finding "friends of friends" requires multiple JOINs that get slower as data grows. A graph DB stores relationships directly-traversing from Alice → her friends → their friends is just following pointers. Milliseconds instead of seconds.

Graph database stores data as nodes (entities) and edges (relationships). Optimized for traversing connections, making relationship queries fast regardless of data size.

2Core Concepts

Nodes (Vertices)

Entities in your data: User, Product, Post, Location

(:User {name: 'Alice', age: 30})

Edges (Relationships)

Connections between nodes. Have a type and direction.

-[:FOLLOWS]→, -[:PURCHASED]→, -[:FRIENDS_WITH]-

Properties

Key-value attributes on nodes and edges

FOLLOWS {since: '2023-01-15'}

Labels

Categories for nodes. A node can have multiple labels.

(:User:Admin), (:Product:Electronics)

3Graph vs Relational

QueryRelational (SQL)Graph (Cypher)
Friends of friends3+ JOINs, slowSimple traversal
Shortest pathRecursive CTEs, complexBuilt-in function
Pattern matchingMultiple JOINsVisual pattern syntax
Aggregate analyticsOptimizedLess optimized

Rule of thumb: If your queries are about relationships and traversals, use a graph DB. If they're about aggregations and reporting, use relational.

4Cypher Query Language

Neo4j Cypher Examples
// Find all of Alice's friends
MATCH (a:User {name: 'Alice'})-[:FRIENDS_WITH]->(friend)
RETURN friend.name

// Friends of friends (2 hops)
MATCH (a:User {name: 'Alice'})-[:FRIENDS_WITH*2]->(fof)
RETURN DISTINCT fof.name

// Shortest path between two users
MATCH path = shortestPath(
  (a:User {name: 'Alice'})-[*]-(b:User {name: 'Bob'})
)
RETURN path

// Recommend products bought by similar users
MATCH (u:User {name: 'Alice'})-[:PURCHASED]->(p:Product)<-[:PURCHASED]-(similar)
MATCH (similar)-[:PURCHASED]->(rec:Product)
WHERE NOT (u)-[:PURCHASED]->(rec)
RETURN rec.name, COUNT(*) as score ORDER BY score DESC

5Use Cases

Social Networks

Friends, followers, mutual connections, feed ranking

Facebook, LinkedIn, Twitter

Recommendation Engines

Users who bought X also bought Y. Similar content.

Netflix, Amazon, Spotify

Fraud Detection

Find suspicious patterns: circular transactions, identity links

Banks, payment processors

Knowledge Graphs

Connect entities with semantic relationships

Google Knowledge Graph, Wikipedia

Network Infrastructure

Routers, switches, dependencies, impact analysis

Telecom, cloud providers

Access Control

Who has access to what through which groups/roles

Enterprise permissions

6Popular Graph Databases

Neo4j

Most popular. Cypher query language. ACID compliant.

CypherACID transactionsGreat tooling

Amazon Neptune

AWS managed. Supports Gremlin and SPARQL.

Fully managedMulti-AZServerless option

JanusGraph

Open source, distributed. Pluggable storage (Cassandra, HBase).

Horizontal scalingGremlinOpen source

Dgraph

Distributed, native GraphQL support.

GraphQL nativeHorizontal scalingACID

7Key Takeaways

1Graph DBs excel at relationship-heavy queries: social, recommendations, fraud
2Nodes = entities, Edges = relationships, both have properties
3Traversal queries are O(1) per hop, not O(n) like JOINs
4Use relational for aggregations, graph for connections
5Neo4j (Cypher) is most popular; Neptune for AWS managed

?Quiz

1. 'Find all users within 3 connections of Alice'. Best DB?

2. Graph DBs store relationships as: