Data Science: Graphical Analysis of data using Neo4j and Gephi Tool

Prince Patel
4 min readOct 29, 2021

Very literally, a graph database is a database structured to consider the relationships between data as similarly important to the data itself. It is intended to retain data without constricting it to a pre-defined model. Alternatively, the material is processed as we first sketch it out-explaining how each particular person communicates with or is connected to another.

What is Neo4j?

Neo4j is a NoSQL, open-source, native graph database that provides the applications with an ACID-compliant transactional backend. The original development started in 2003, but since 2007 it has been freely available. The source code, written in Java and Scala, can be downloaded free of charge on GitHub, or as a user-friendly desktop application. Neo4j has both a Group Version of the database and an enterprise Edition.

Difference between Graph and Relational Database

  1. Show movies that are released after the year 2006.

Query:

MATCH (m:Movie) where m.released > 2006 RETURN m

5 movies which are released after 2006

2. Query movies released after 2002 and limit the movie count up to 1only.

Query:-

MATCH (m:Movie) where m.released > 2002 RETURN m limit 1

1 movies were displayed

3. The below query returns the name of the person, director, and movie name that are released after the year 2007 up to a limit of 5.

Query:-

MATCH (p:Person)-[d:DIRECTED]-(m:Movie) where m.released > 2007 RETURN p,d,m limit 5

9 movies were displayed

4. If we want to know the list of the persons that are available in the database we can use the following which queries the list of people but limits the output up to 10 people only.

Query:-

MATCH (p:Person) RETURN p limit 10

Displaying 10 nodes

5. If one wants to search whether a movie with a particular name is present or not the following query.

Query:-

MATCH (m:Movie {title: 'A Few Good Men'}) RETURN m

Displaying 1 node

Gephi:

Gephi is an open-source network analysis and visualization software package written in Java on the NetBeans platform.

The Gephi Toolkit project package essential modules (Graph, Layout, Filters, IO…) in a standard Java library, which any Java project can use for getting things done. The toolkit is just a single JAR that anyone could reuse in new Java applications and achieve tasks that can be done in Gephi automatically, from a command-line program for instance. The ability to use Gephi features like this in other Java applications boost possibilities and promise to be very useful.

  1. First, open the gephi tool and click on the new project. After that choose File->Open and load the dataset of your choice as shown below. And Load the dataset.

Load the dataset

In the image you can see there are no Issues and what are nodes and edges will available in that it was displayed.

2. Below is how all the nodes and edges are displayed l after the load of the dataset.

display nodes and edges

3. Then After clicking on Layout and choose ForceAtlas and click on the run button.

image of ForceAtlas

4. Next we can differentiate the nodes based on a various ranking like their In-Degree, Out-Degree, or Degree and show them in different colors. For this in the left pane on the top side choose Nodes->Ranking there choose the ranking like in the below image Degree is chosen.

Ranking of nodes and edges

5. For data table click on windows->data table.

data table

6. Next we generate a Degree Distribution graph for Degree, In-Degree, and Out-Degree and also get the Average Degree value for all the nodes. for that click on the right pane choose the Statistics tab, and there run Average Degree in the Network Overview section.

avg degree

7. Average degree report you see when you click on the run button in the above image.

Degree Report

--

--