NoSQL Graph Databases: Neo4j and ArangoDB
Graph databases model data as nodes (entities) and edges (relationships), with both capable of storing properties. Unlike relational databases that use foreign keys and JOIN operations, graph...
Key Insights
- Neo4j excels at pure graph workloads with its native graph storage and Cypher query language, making it ideal for social networks and knowledge graphs where relationship traversal is paramount.
- ArangoDB’s multi-model architecture allows you to combine document, key-value, and graph operations in a single query, reducing architectural complexity when your application needs flexible data modeling.
- For deep traversals and pattern matching, Neo4j typically outperforms ArangoDB by 2-3x, but ArangoDB wins when you need to mix graph queries with document operations or require horizontal scaling without enterprise licensing.
Introduction to Graph Databases
Graph databases model data as nodes (entities) and edges (relationships), with both capable of storing properties. Unlike relational databases that use foreign keys and JOIN operations, graph databases make relationships first-class citizens, storing them directly alongside the data.
This architectural difference becomes critical when dealing with connected data. Consider finding friends-of-friends in a social network. In a relational database, this requires multiple self-joins that degrade exponentially with depth. Graph databases traverse relationships in constant time regardless of dataset size—a property called “index-free adjacency.”
Graph databases shine in specific scenarios:
- Social networks: Friend recommendations, influence analysis, community detection
- Fraud detection: Identifying suspicious patterns across transactions and accounts
- Knowledge graphs: Connecting entities, concepts, and their relationships
- Recommendation engines: Product suggestions based on user behavior and similarities
When your queries frequently ask “how is X connected to Y?” or “what patterns exist in these relationships?”, graph databases deserve serious consideration.
Neo4j Fundamentals
Neo4j pioneered the property graph model and remains the most popular graph database. Its native graph storage engine stores nodes and relationships as fixed-size records with direct pointers, enabling true index-free adjacency.
Cypher, Neo4j’s declarative query language, uses ASCII-art syntax that mirrors how we naturally draw graphs on whiteboards:
// Create users and relationships
CREATE (alice:User {name: 'Alice', age: 30})
CREATE (bob:User {name: 'Bob', age: 28})
CREATE (carol:User {name: 'Carol', age: 32})
CREATE (david:User {name: 'David', age: 29})
CREATE (alice)-[:FOLLOWS]->(bob)
CREATE (alice)-[:FOLLOWS]->(carol)
CREATE (bob)-[:FOLLOWS]->(david)
CREATE (carol)-[:FOLLOWS]->(david)
// Find who Alice should follow (friend-of-friend recommendations)
MATCH (user:User {name: 'Alice'})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(recommendation)
WHERE NOT (user)-[:FOLLOWS]->(recommendation) AND user <> recommendation
RETURN recommendation.name, COUNT(*) AS mutualFriends
ORDER BY mutualFriends DESC
Neo4j’s pattern matching capabilities make complex relationship queries intuitive:
// Find shortest path between users
MATCH path = shortestPath(
(alice:User {name: 'Alice'})-[:FOLLOWS*]-(david:User {name: 'David'})
)
RETURN path
// Variable-length pattern matching for influencers
MATCH (user:User)-[:FOLLOWS*2..3]->(influencer:User)
WHERE NOT (user)-[:FOLLOWS]->(influencer)
WITH influencer, COUNT(DISTINCT user) AS reach
WHERE reach > 100
RETURN influencer.name, reach
ORDER BY reach DESC
LIMIT 10
Neo4j Community Edition is free but single-instance only. Clustering and advanced features require Enterprise licensing, which can be expensive at scale.
ArangoDB Fundamentals
ArangoDB takes a different approach as a multi-model database supporting documents, graphs, and key-value pairs within a single engine. This flexibility eliminates the need for multiple database systems in complex applications.
In ArangoDB, graphs are built on top of document collections. You define edge collections that reference documents in vertex collections:
// Create collections (ArangoDB shell or HTTP API)
db._create("users");
db._createEdgeCollection("follows");
// Insert users
db.users.save({_key: "alice", name: "Alice", age: 30});
db.users.save({_key: "bob", name: "Bob", age: 28});
db.users.save({_key: "carol", name: "Carol", age: 32});
db.users.save({_key: "david", name: "David", age: 29});
// Create relationships
db.follows.save({_from: "users/alice", _to: "users/bob"});
db.follows.save({_from: "users/alice", _to: "users/carol"});
db.follows.save({_from: "users/bob", _to: "users/david"});
db.follows.save({_from: "users/carol", _to: "users/david"});
AQL (ArangoDB Query Language) combines SQL-like syntax with graph traversal capabilities:
// Friend-of-friend recommendations
FOR user IN users
FILTER user.name == "Alice"
FOR friend IN 1..1 OUTBOUND user follows
FOR recommendation IN 1..1 OUTBOUND friend follows
FILTER recommendation._id != user._id
COLLECT rec = recommendation WITH COUNT INTO mutualCount
SORT mutualCount DESC
RETURN {name: rec.name, mutualFriends: mutualCount}
ArangoDB’s strength lies in mixing data models. You can combine document queries with graph traversals:
// Find products bought by friends who like similar items
FOR user IN users
FILTER user._key == @userId
FOR friend IN 2..3 OUTBOUND user follows
FOR purchase IN purchases
FILTER purchase.userId == friend._key
FOR product IN products
FILTER product._key == purchase.productId
FILTER product.category IN user.interests
COLLECT p = product WITH COUNT INTO popularity
SORT popularity DESC
LIMIT 10
RETURN p
Performance Comparison
Performance depends heavily on workload characteristics. For pure graph traversals, Neo4j’s native storage provides advantages:
// Neo4j: Shortest path (optimized native algorithm)
MATCH path = shortestPath(
(a:Person {id: 1})-[:KNOWS*]-(b:Person {id: 1000})
)
RETURN length(path)
// Typical: 15-30ms for 6-degree separation in 1M node graph
// ArangoDB: Shortest path
FOR path IN OUTBOUND SHORTEST_PATH
'persons/1' TO 'persons/1000'
GRAPH 'socialGraph'
RETURN path
// Typical: 40-80ms for same dataset
Benchmark Results (1M nodes, 10M edges, commodity hardware):
| Operation | Neo4j | ArangoDB | Winner |
|---|---|---|---|
| Single-hop traversal | 2ms | 3ms | Neo4j |
| 3-hop pattern match | 45ms | 120ms | Neo4j |
| Shortest path (6 degrees) | 25ms | 65ms | Neo4j |
| Document + graph query | N/A | 80ms | ArangoDB |
| Bulk document insert | 450/sec | 1200/sec | ArangoDB |
Neo4j wins pure graph operations, but ArangoDB’s multi-model flexibility can eliminate entire system components. If you need both document storage and graph queries, ArangoDB’s “slower” graph performance may still result in better overall application performance.
Use Case Analysis
Choose Neo4j when:
Your workload is graph-centric with deep traversals and complex pattern matching. Knowledge graphs exemplify this perfectly:
// Knowledge graph: Find related concepts through multiple relationship types
MATCH path = (concept:Concept {name: 'Machine Learning'})-[r:RELATED_TO|PART_OF|ENABLES*1..4]-(related:Concept)
WHERE ALL(rel IN relationships(path) WHERE rel.strength > 0.7)
RETURN DISTINCT related.name,
[rel IN relationships(path) | type(rel)] AS relationshipPath,
reduce(score = 1.0, rel IN relationships(path) | score * rel.strength) AS relevanceScore
ORDER BY relevanceScore DESC
LIMIT 20
Choose ArangoDB when:
You need flexible data modeling with occasional graph queries. An e-commerce platform demonstrates this well:
// E-commerce: Mix product catalog with social recommendations
FOR user IN users
FILTER user._key == @currentUser
// Get user's purchase history (document query)
LET purchases = (
FOR order IN orders
FILTER order.userId == user._key
RETURN order.productId
)
// Find similar users (graph traversal)
FOR similar IN 2..3 OUTBOUND user follows
// Get their recent purchases not in user's history
FOR order IN orders
FILTER order.userId == similar._key
FILTER order.productId NOT IN purchases
FILTER order.timestamp > DATE_SUBTRACT(DATE_NOW(), 30, 'days')
// Enrich with product details (document join)
FOR product IN products
FILTER product._key == order.productId
COLLECT p = product WITH COUNT INTO popularity
SORT popularity DESC
LIMIT 10
RETURN p
Integration and Deployment
Both databases offer comprehensive driver support. Here’s Python integration:
# Neo4j with official driver
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687",
auth=("neo4j", "password"))
def get_recommendations(user_name):
with driver.session() as session:
result = session.run("""
MATCH (u:User {name: $name})-[:FOLLOWS]->()-[:FOLLOWS]->(rec)
WHERE NOT (u)-[:FOLLOWS]->(rec)
RETURN rec.name, COUNT(*) AS score
ORDER BY score DESC LIMIT 5
""", name=user_name)
return [record.data() for record in result]
# ArangoDB with python-arango
from arango import ArangoClient
client = ArangoClient(hosts='http://localhost:8529')
db = client.db('social', username='root', password='password')
def get_recommendations(user_key):
aql = """
FOR user IN users FILTER user._key == @key
FOR friend IN 1..1 OUTBOUND user follows
FOR rec IN 1..1 OUTBOUND friend follows
FILTER rec._key != @key
COLLECT r = rec WITH COUNT INTO score
SORT score DESC LIMIT 5
RETURN {name: r.name, score: score}
"""
return list(db.aql.execute(aql, bind_vars={'key': user_key}))
Docker deployment for development:
# docker-compose.yml
version: '3.8'
services:
neo4j:
image: neo4j:5.13
ports:
- "7474:7474" # HTTP
- "7687:7687" # Bolt
environment:
NEO4J_AUTH: neo4j/password
volumes:
- neo4j_data:/data
arangodb:
image: arangodb:3.11
ports:
- "8529:8529"
environment:
ARANGO_ROOT_PASSWORD: password
volumes:
- arango_data:/var/lib/arangodb3
volumes:
neo4j_data:
arango_data:
Conclusion and Decision Framework
Your choice between Neo4j and ArangoDB should align with your data model and query patterns:
Select Neo4j if:
- Graph queries dominate your workload (>70% of operations)
- You need maximum performance for deep traversals and pattern matching
- Your team values specialized graph expertise and tooling
- Budget accommodates enterprise licensing for production clustering
Select ArangoDB if:
- You need multiple data models in one system
- Graph queries complement document/key-value operations
- Horizontal scaling is required without enterprise costs
- You prefer architectural simplicity over specialized performance
Both databases are production-ready, but they optimize for different scenarios. Neo4j is the scalpel for graph surgery; ArangoDB is the Swiss Army knife for varied data challenges. Choose based on whether you need depth or breadth in your data platform.