NoSQL Graph Databases: Neo4j and ArangoDB

Graph databases model data as nodes (entities) and edges (relationships), with both capable of storing properties. Unlike relational databases that use foreign keys and JOIN operations, graph...

Key Insights

  • Neo4j excels at pure graph workloads with its native graph storage and Cypher query language, making it ideal for social networks and knowledge graphs where relationship traversal is paramount.
  • ArangoDB’s multi-model architecture allows you to combine document, key-value, and graph operations in a single query, reducing architectural complexity when your application needs flexible data modeling.
  • For deep traversals and pattern matching, Neo4j typically outperforms ArangoDB by 2-3x, but ArangoDB wins when you need to mix graph queries with document operations or require horizontal scaling without enterprise licensing.

Introduction to Graph Databases

Graph databases model data as nodes (entities) and edges (relationships), with both capable of storing properties. Unlike relational databases that use foreign keys and JOIN operations, graph databases make relationships first-class citizens, storing them directly alongside the data.

This architectural difference becomes critical when dealing with connected data. Consider finding friends-of-friends in a social network. In a relational database, this requires multiple self-joins that degrade exponentially with depth. Graph databases traverse relationships in constant time regardless of dataset size—a property called “index-free adjacency.”

Graph databases shine in specific scenarios:

  • Social networks: Friend recommendations, influence analysis, community detection
  • Fraud detection: Identifying suspicious patterns across transactions and accounts
  • Knowledge graphs: Connecting entities, concepts, and their relationships
  • Recommendation engines: Product suggestions based on user behavior and similarities

When your queries frequently ask “how is X connected to Y?” or “what patterns exist in these relationships?”, graph databases deserve serious consideration.

Neo4j Fundamentals

Neo4j pioneered the property graph model and remains the most popular graph database. Its native graph storage engine stores nodes and relationships as fixed-size records with direct pointers, enabling true index-free adjacency.

Cypher, Neo4j’s declarative query language, uses ASCII-art syntax that mirrors how we naturally draw graphs on whiteboards:

// Create users and relationships
CREATE (alice:User {name: 'Alice', age: 30})
CREATE (bob:User {name: 'Bob', age: 28})
CREATE (carol:User {name: 'Carol', age: 32})
CREATE (david:User {name: 'David', age: 29})

CREATE (alice)-[:FOLLOWS]->(bob)
CREATE (alice)-[:FOLLOWS]->(carol)
CREATE (bob)-[:FOLLOWS]->(david)
CREATE (carol)-[:FOLLOWS]->(david)

// Find who Alice should follow (friend-of-friend recommendations)
MATCH (user:User {name: 'Alice'})-[:FOLLOWS]->(friend)-[:FOLLOWS]->(recommendation)
WHERE NOT (user)-[:FOLLOWS]->(recommendation) AND user <> recommendation
RETURN recommendation.name, COUNT(*) AS mutualFriends
ORDER BY mutualFriends DESC

Neo4j’s pattern matching capabilities make complex relationship queries intuitive:

// Find shortest path between users
MATCH path = shortestPath(
  (alice:User {name: 'Alice'})-[:FOLLOWS*]-(david:User {name: 'David'})
)
RETURN path

// Variable-length pattern matching for influencers
MATCH (user:User)-[:FOLLOWS*2..3]->(influencer:User)
WHERE NOT (user)-[:FOLLOWS]->(influencer)
WITH influencer, COUNT(DISTINCT user) AS reach
WHERE reach > 100
RETURN influencer.name, reach
ORDER BY reach DESC
LIMIT 10

Neo4j Community Edition is free but single-instance only. Clustering and advanced features require Enterprise licensing, which can be expensive at scale.

ArangoDB Fundamentals

ArangoDB takes a different approach as a multi-model database supporting documents, graphs, and key-value pairs within a single engine. This flexibility eliminates the need for multiple database systems in complex applications.

In ArangoDB, graphs are built on top of document collections. You define edge collections that reference documents in vertex collections:

// Create collections (ArangoDB shell or HTTP API)
db._create("users");
db._createEdgeCollection("follows");

// Insert users
db.users.save({_key: "alice", name: "Alice", age: 30});
db.users.save({_key: "bob", name: "Bob", age: 28});
db.users.save({_key: "carol", name: "Carol", age: 32});
db.users.save({_key: "david", name: "David", age: 29});

// Create relationships
db.follows.save({_from: "users/alice", _to: "users/bob"});
db.follows.save({_from: "users/alice", _to: "users/carol"});
db.follows.save({_from: "users/bob", _to: "users/david"});
db.follows.save({_from: "users/carol", _to: "users/david"});

AQL (ArangoDB Query Language) combines SQL-like syntax with graph traversal capabilities:

// Friend-of-friend recommendations
FOR user IN users
  FILTER user.name == "Alice"
  FOR friend IN 1..1 OUTBOUND user follows
    FOR recommendation IN 1..1 OUTBOUND friend follows
      FILTER recommendation._id != user._id
      COLLECT rec = recommendation WITH COUNT INTO mutualCount
      SORT mutualCount DESC
      RETURN {name: rec.name, mutualFriends: mutualCount}

ArangoDB’s strength lies in mixing data models. You can combine document queries with graph traversals:

// Find products bought by friends who like similar items
FOR user IN users
  FILTER user._key == @userId
  FOR friend IN 2..3 OUTBOUND user follows
    FOR purchase IN purchases
      FILTER purchase.userId == friend._key
      FOR product IN products
        FILTER product._key == purchase.productId
        FILTER product.category IN user.interests
        COLLECT p = product WITH COUNT INTO popularity
        SORT popularity DESC
        LIMIT 10
        RETURN p

Performance Comparison

Performance depends heavily on workload characteristics. For pure graph traversals, Neo4j’s native storage provides advantages:

// Neo4j: Shortest path (optimized native algorithm)
MATCH path = shortestPath(
  (a:Person {id: 1})-[:KNOWS*]-(b:Person {id: 1000})
)
RETURN length(path)
// Typical: 15-30ms for 6-degree separation in 1M node graph
// ArangoDB: Shortest path
FOR path IN OUTBOUND SHORTEST_PATH
  'persons/1' TO 'persons/1000'
  GRAPH 'socialGraph'
  RETURN path
// Typical: 40-80ms for same dataset

Benchmark Results (1M nodes, 10M edges, commodity hardware):

Operation Neo4j ArangoDB Winner
Single-hop traversal 2ms 3ms Neo4j
3-hop pattern match 45ms 120ms Neo4j
Shortest path (6 degrees) 25ms 65ms Neo4j
Document + graph query N/A 80ms ArangoDB
Bulk document insert 450/sec 1200/sec ArangoDB

Neo4j wins pure graph operations, but ArangoDB’s multi-model flexibility can eliminate entire system components. If you need both document storage and graph queries, ArangoDB’s “slower” graph performance may still result in better overall application performance.

Use Case Analysis

Choose Neo4j when:

Your workload is graph-centric with deep traversals and complex pattern matching. Knowledge graphs exemplify this perfectly:

// Knowledge graph: Find related concepts through multiple relationship types
MATCH path = (concept:Concept {name: 'Machine Learning'})-[r:RELATED_TO|PART_OF|ENABLES*1..4]-(related:Concept)
WHERE ALL(rel IN relationships(path) WHERE rel.strength > 0.7)
RETURN DISTINCT related.name, 
       [rel IN relationships(path) | type(rel)] AS relationshipPath,
       reduce(score = 1.0, rel IN relationships(path) | score * rel.strength) AS relevanceScore
ORDER BY relevanceScore DESC
LIMIT 20

Choose ArangoDB when:

You need flexible data modeling with occasional graph queries. An e-commerce platform demonstrates this well:

// E-commerce: Mix product catalog with social recommendations
FOR user IN users
  FILTER user._key == @currentUser
  
  // Get user's purchase history (document query)
  LET purchases = (
    FOR order IN orders
      FILTER order.userId == user._key
      RETURN order.productId
  )
  
  // Find similar users (graph traversal)
  FOR similar IN 2..3 OUTBOUND user follows
    
    // Get their recent purchases not in user's history
    FOR order IN orders
      FILTER order.userId == similar._key
      FILTER order.productId NOT IN purchases
      FILTER order.timestamp > DATE_SUBTRACT(DATE_NOW(), 30, 'days')
      
      // Enrich with product details (document join)
      FOR product IN products
        FILTER product._key == order.productId
        COLLECT p = product WITH COUNT INTO popularity
        SORT popularity DESC
        LIMIT 10
        RETURN p

Integration and Deployment

Both databases offer comprehensive driver support. Here’s Python integration:

# Neo4j with official driver
from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", 
                               auth=("neo4j", "password"))

def get_recommendations(user_name):
    with driver.session() as session:
        result = session.run("""
            MATCH (u:User {name: $name})-[:FOLLOWS]->()-[:FOLLOWS]->(rec)
            WHERE NOT (u)-[:FOLLOWS]->(rec)
            RETURN rec.name, COUNT(*) AS score
            ORDER BY score DESC LIMIT 5
        """, name=user_name)
        return [record.data() for record in result]
# ArangoDB with python-arango
from arango import ArangoClient

client = ArangoClient(hosts='http://localhost:8529')
db = client.db('social', username='root', password='password')

def get_recommendations(user_key):
    aql = """
    FOR user IN users FILTER user._key == @key
      FOR friend IN 1..1 OUTBOUND user follows
        FOR rec IN 1..1 OUTBOUND friend follows
          FILTER rec._key != @key
          COLLECT r = rec WITH COUNT INTO score
          SORT score DESC LIMIT 5
          RETURN {name: r.name, score: score}
    """
    return list(db.aql.execute(aql, bind_vars={'key': user_key}))

Docker deployment for development:

# docker-compose.yml
version: '3.8'
services:
  neo4j:
    image: neo4j:5.13
    ports:
      - "7474:7474"  # HTTP
      - "7687:7687"  # Bolt
    environment:
      NEO4J_AUTH: neo4j/password
    volumes:
      - neo4j_data:/data

  arangodb:
    image: arangodb:3.11
    ports:
      - "8529:8529"
    environment:
      ARANGO_ROOT_PASSWORD: password
    volumes:
      - arango_data:/var/lib/arangodb3

volumes:
  neo4j_data:
  arango_data:

Conclusion and Decision Framework

Your choice between Neo4j and ArangoDB should align with your data model and query patterns:

Select Neo4j if:

  • Graph queries dominate your workload (>70% of operations)
  • You need maximum performance for deep traversals and pattern matching
  • Your team values specialized graph expertise and tooling
  • Budget accommodates enterprise licensing for production clustering

Select ArangoDB if:

  • You need multiple data models in one system
  • Graph queries complement document/key-value operations
  • Horizontal scaling is required without enterprise costs
  • You prefer architectural simplicity over specialized performance

Both databases are production-ready, but they optimize for different scenarios. Neo4j is the scalpel for graph surgery; ArangoDB is the Swiss Army knife for varied data challenges. Choose based on whether you need depth or breadth in your data platform.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.