Monday, February 9, 2015

Working with Neo4J

Introduction

Graph databases are emerging as one big thing as complex data especially social data is evolving. Facebook is managing and processing its users and their friends and their mutual social connection via graph processing. There are plenty of graph databases such as neo4j, OrientDB, FlockDB, GraphDB etc. available. neo4j is considered as the leading graph database.

Installation

Check Java

Since neo4j is written in java, so verify if java is installed on your machine. Anything >=7 is fine. It works with both OpenJDK7 or OracleJDK7.

Download neo4j

neo4j comes in two flavours. One is the Enterprise edition (not free) and one is Community edition (free). Download the Community edition (zip file) from the official site and you are good to go. Unzip the file neo4j-community-2.1.7-unix.tar.gz.

tar -xzvf neo4j-community-2.1.7-unix.tar.gz

This will create 'neo4j-community-2.1.7' directory. move into the extracted directory.

cd neo4j-community-2.1.7

start the neo4j server with

bin/neo4j start

Webinterface

Once the server is up and running, you can access it from web browser as well. neo4j runs on 7474 port. So access http://localhost:7474 to test if neo4j server is running. You should see something similar to the following figure.



WebConsole

It also offers a console in its webadmin interface. This interface provides much more detail about the neo4j server i.e. number of nodes, number of relations and the properties stored on the server. From this interface, you can execute cypher queries. Cypher is the query language used in neo4j. Access webadmin interface from http://localhost:7474/webadmin/

webadmin interface

console interface

Cypher (neo4j Query Language)

neo4j comes with its own query language called Cypher. There are many resources online about cypher and its syntax. One good (or shortcut) resource is a cheatsheet describing many of the queries.

see the link
http://assets.neo4j.org/download/Neo4j_CheatSheet_v3.pdf


Small program using Python

There are plenty of python libraries for neo4j available. However, py2neo is found to be more mature and actively maintained. It's latest release is 2.0.

The following program creates persons and establish their friendship relation.


 from py2neo import neo4j, rel  
 from random import randint  
 import random  
 import time  
 import sys  
 #initialize the random seed  
 random.seed(1234567)  
 about=["sample about me", "I appreciate this", "I don't like working",  
 "it is tidious to work for late hours"]  
 node_labels=["human", "physicist", "doctor", "phd"]  
 graph_db = neo4j.GraphDatabaseService()  
 
 def clear_db():  
   graph_db.clear()  
   print 'db size: %s' % graph_db.order  

 def create_sample_nodes(size):    
   for n in xrange(size):        
     node = graph_db.create(  
         {  
         "name":"node%s" % n,  
         "number": n,  
         "age": randint(20,50) ,  
         "gender": 'Male' if randint(1,2)==2 else 'Female',  
         "about_me": about[randint(0,3)],   
         "married": True if randint(1,2) == 2 else False,  
         "marks": random.uniform(60.1, 85.9)  
         }  
       )  
     node[0].add_labels(",".join(node_labels[0:randint(1,3)]) )      
   print graph_db.order 
 
 def read_sample_nodes(size):  
   for n in xrange(size):  
     node = graph_db.node(n)  
     print node  

 def create_friend_relations(size):   
   #first create friendship relation between random nodes  
   for n in xrange(size):  
     node = graph_db.node(n)  
     o = randint(0, size-1)  #select a random node
     #avoid loop relation with itself  
     if ( 0 == n ): o = randint(0, size-1)  
     other_node = graph_db.node( o )  
     props = { "since": time.time(),  
          "location":"%s:%s"%(randint(10,360), randint(10,360))  
         }  
     rel = node.create_path(("Friends",props), other_node)  
 if __name__=='__main__':  
   clear_db()      
   st_time = time.clock()   
   size = sys.argv[1]
   write_sample_nodes(size)
   create_friend_relations(size)    
   read_sample_nodes(size)  
   print 'time taken: %s' % (time.time() - st_time)  


Hope it helps to get start with neo4j database :)

References:
neo4j: http://neo4j.com/
py2neo: http://py2neo.org/2.0/

No comments: