Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB - Mahesh Chaudhari @ GraphConnect SF 2013

Avoiding Deadlocks in Neo4j on
Z-Platform
- Mahesh Chaudhari, Cesar Arevalo &
Brian Roy

Outline
•
•
•
•
•
•

Introduction to the Z-Platform
Problems caused by Deadlocks
Locks and Deadlocks in Neo4j
Avoidance using Bipartite graphs
Performance
Conclusion
2

Full Entity
Profiles

Z-Platform
Sparse
Representatio
n of Profiles

MongoDB
Database

Json
Documents

Nodes & Edges
Neo4j Graph
Database

Z-Platform

Source Datasets
3

Deadlocks in Z-Platform
• Creating relationships is one of the most time
consuming processes
• Log analysis reveals deadlocks among batch
transactions and retry-mechanism takes time
• Dependent on how nodes and relationships are
grouped together
• Batch size is dependent on the size of the JSON
block sent to the server
• Time required to build relationships and resolve
deadlocks is in the order of seconds
4

Locks in Neo4j
• Create a Node n1  Write Lock on Node n1
• Update a Node n1  Write Lock on Node n1
but read available on Node n1
• Create a Relationship r1 between nodes n1
and n2  Write Locks on relationship r1, n1
and n2

5

Deadlocks across processes
A

B

P1

C

D

• Processes: P1 and P2
• Nodes: A, B, C, D
• Relationships: R1, R2, R3, R4

P2

R1
A

No Deadlocks

B
R4

A

C

B
R2

No Deadlocks

C

R1

R1
A

P1

P1

A

B

P1

R3
A

R3
D

P2

A

D

P2

Possibility of Deadlock

D
R2

C
Deadlock

6

P2
D

Deadlocks across Transactions
• Transactions are also like separate processes
but in a single thread or multiple threads
• Deadlocks occur across transactions
– Two concurrent transactions need write locks on
the same node n1
– In two concurrent transactions, T1 has write lock
on node n1 and waiting on write lock on node n2
whereas T2 has write lock on n2 and is waiting for
write lock on node n1
– Transactions of varying sizes
7

Concurrent Transactions Deadlocks
A

B

T1

A

B

T1

C

D

T2

A

D

T2

No Deadlocks

Possibility of Deadlock

8

Sequential Asynchronous Transactions
Deadlocks
A

E

B
.
. n edges
.
F

A
T1
E

B

A

.
. n edges
.
F

D

T1

E

C

No Deadlocks

D

T2

A

T1

D

T2

A

F
.
. n edges
.
B

No Deadlock

Deadlock
9

T2

Deadlocks Detection and Avoidance
• Deadlocks Detection
– Only possible at run-time
– Recovery from deadlock is either to abort or retry

• Deadlocks Avoidance
– Reorder the operations to lower or eliminate the
likelihood of deadlocks
• Graph Clustering Algorithms: Most of them require
knowledge of entire graph

Clustering Relationships  Bipartite Graphs
10

Bipartite Graphs
• Given a Graph G with Vertices V and Edges
E, then graph G is a bipartite graph such that
vertices V can be partitioned into two
independent sets V1 and V2.
V1

A

V2

A

E

D
C
C

D

E
B

B
11

Creating Bipartite Graphs
• Use two colors to color each node such that
no two adjacent nodes have the same color.
1

2

A

V1

V2

A

E

D

C
C

D

E

B

B

12

Non-Bipartite Graphs
1

2
V1

A

E

V2

A
D
C

C

D

E
B

B

13

Algorithm to generate Graph
V1

V2

A

D
C
E
B

• Create all the nodes
• Create batches of
relationships among
the same colored
nodes
• Create batches of
relationships across
the two colors
14

Algorithm in Z-Platform
• Batch of relationships R = {r1, r2, r3….. rn} :
– each r is a triplet {src, dest, props} where src and dest
are nodes and props is a set of key-value pairs

• Color the nodes based on each relationship with
two colors
• Mark the conflicting edges where both the src
and dest nodes are of the same color
• Batch these relationships together in a single
batch
• Start grouping the remaining edges such that no
two batches have any node in common
15

Performance – Test Setup
•
•
•
•
•

JDK 1.7
Neo4j Java Binding Rest API
Neo4j Enterprise Server 1.9
Batch size (configurable) : 2000
Test Program that generates random nodes
(max 1000) and relationships (max 10,000)
• Huge file that contains 10,226 nodes and
39,564,960 relationships (5 GB)
16

Performance – Creating Nodes
Time in seconds for Nodes
1.8
1.6
1.4
1.2
1
Time in Secs

0.8
0.6
0.4
0.2
0
1

2

3

4

5

6

• 10,226 Nodes: 5.07 seconds
• Average Time for 2000 Nodes: 0.99 seconds ~ 1 second
• Each Node has 11 properties
17

Performance – Creating Relationships
Time in Seconds for relationships
1.4
1.2
1

0.8
Time in Secs

0.6
0.4
0.2
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87

• 1,74,000 Relationships created in 47.16 seconds
• Average Time for 2000 relationships: 0.54 seconds
• Number of relationships per second: 3,689
18

Performance – Creating 39 Million Relationships

• 39,564,960 Relationships in : 10,573.56 seconds (2 hrs 56 mins 13 seconds)
• Average Time for 2000 relationships: 0.53 seconds
• Number of relationships per second: 3,741
19

Graph Visualized in the Neo4j 2.0

20

Future Work
• Test performance over the network using Amazon EC2
servers to mimic real world setup
• Single threaded application  multi-threaded to see if
better performance
– More complex algorithm to batch relationships together
– Analyze if the complexity is worth the performance
improvement

• Vary multiple factors:
– Batch size : 1000 to 4000
– Properties (relationship descriptors) : 2 – 20

• Dispatcher Pattern to facilitate the single point distribution
of nodes and relationships to threads/Transactions

21

Conclusion
• Deadlocks in general are time consuming and
difficult to detect and prevent
• Use of graph coloring to partition graph into
conflicting and non-conflicting edges
• Successful prototype tests shows significant
improvement in building relationships varying
from small number to a very large number

22

Dr. Mahesh Chaudhari
Sr. Software Engineer
+1 602 524 0610
mahesh@zephyrhealthinc.com

jobs@zephyrhealthinc.com
23

Contact Information
Sven Junkergård

Brian Roy

Director of Technology
+1 415 503 7412
sven@zephyrhealthinc.com

Director of Platform Engineering & Architect
+1 415 663 6919
brian@zephyrhealthinc.com

Zephyr Health Inc.
589 Howard St. 3rd Flr.
San Francisco, California 94105
+1.415.529.7649
zephyrhealthinc.com

24

Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB - Mahesh Chaudhari @ GraphConnect SF 2013

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Similar to Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB - Mahesh Chaudhari @ GraphConnect SF 2013

Similar to Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB - Mahesh Chaudhari @ GraphConnect SF 2013 (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

Avoiding Deadlocks: Lessons Learned with Zephyr Health Using Neo4j and MongoDB - Mahesh Chaudhari @ GraphConnect SF 2013

Editor's Notes