In this article by Devangana Khokhar, the author of Gephi Cookbook, we'll learn about the statistical properties of graphical networks and how they can exploit these properties with the help of Gephi.

Gephi provides some ready-to-use ways to study the statistical properties of graphical networks. These statistical properties include the properties of the network as a whole, as well as individual properties of nodes and edges within the network. This article will enable you to learn some of these properties and how to use Gephi to explore them. So let's get started!

(For more resources related to this topic, see here.)

Selecting a list of metrics for a graph

Gephi offers a wide variety of metrics for exploring graphs. These metrics allow users to explore graphs from various perspectives. In this recipe, we will learn how to access these different metrics for a specified graph.

Getting ready

Load a graph of your choice in Gephi.

How to do it…

To view different metrics available in Gephi for exploring a graph, follow these steps:

In the Statistics panel situated on the right-hand side of the Gephi window, find the tab that reads Settings.
Click on the Settings tab to open up a pop-up window.
From the list of available metrics in the pop-up window, check the ones that you would like to work with:
Click on OK. The Statistics panel will get populated with the selected metrics, as shown in the following screenshot:

Finding the average degree and average weighted degree of a graph

The degree of a node in a graph is defined as the number of edges that are incident on that node. The loops—that is, the edges that have the same node as their starting and end point—are counted twice. In this recipe, we will learn how to find the average degree and average weighted degree for a graph.

How to do it…

The following steps illustrate the process to find the average degree and weighted degree of a graph:

Load or create a graph in Gephi. For this recipe, we will consider the Les Misérables network that's already available in Gephi and can be loaded at the Welcome screen.
In the Statistics panel located on the right-hand side of the Gephi application window, under the Network Overview tab, click on the Run button located beside Average Degree:
This opens up a window containing the degree report for the Les Misérables network, as shown in the following screenshot. In the case of directed graphs, the report contains the in-degree and out-degree distributions as well:

The graph in the preceding screenshot depicts the degree distribution for the Les Misérables network. This pop-up window has options for printing, copying, and/or saving the degree report.

The average degree of the Les Misérables network is now displayed in the Statistics panel beside the Run button for Average Degree, as shown in the following screenshot:
To find the average weighted degree of the Les Misérables graph, hit the Run button adjacent to Avg. Weighted Degree in the Network Overview tab of the Statistics panel in the Gephi window.
This will open up a window containing the weighted degree report of the Les Misérables network, as shown in the following screenshot:

The average weighted degree of the Les Misérables graph is now also displayed in the Statistics panel that is adjacent to the Run button for Avg. Weighted Degree:

How it works…

The average degree for a graph is the measure of how many edges there are in the graph compared to its number of vertices. To find out the average degree for a graph, Gephi computes the sum of the degrees of individual nodes in the graph and divides that by the number of nodes present in it.

To find the average weighted degree for a graph with weighted edges, Gephi computes the average mean of the sum of the weights of the incident edges on all the nodes in the graph.

There's more…

If you have closed the report window and wish to see it once again, click on the small button with a question mark adjacent to the Run button. This will reopen the degree report.

Finding the network diameter

The diameter of a network refers to the length of the longest of all the computed shortest paths between all pair of nodes in the network.

How to do it…

The following steps describe how to find the diameter of a network using the capabilities offered by Gephi:

Click on Window in the menu bar located at the top of the Gephi window. From the drop-down, select Welcome. Click on Les Miserables.gexf.
In the pop-up window, select Graph Type as Directed. This opens up the directed version of the Les Misérables network into Gephi.
In the Statistics panel, under the Network Overview tab, click on the Run button, which is next to Network Diameter, to open the Graph Distance settings window:
In the Graph Distance settings window, you can decide on which type of graph, Directed or UnDirected, the diameter algorithm has to be run. If you have loaded an undirected graph, the Directed radio button will remain deactivated. If a directed graph is chosen, you can choose between the directed and undirected versions of it to find the diameter.
Check the box next to Normalize Centralities in [0, 1] to allow Gephi to normalize the three centralities' values between zero and one. The three centralities being referred to here are Betweenness Centrality, Closeness Centrality, and Eccentricity.
Click on OK. This opens up the Graph Distance Report window, as displayed in the following screenshot, that shows the value of the network diameter, network radius, average path length, number of shortest paths, and three separate graphs depicting betweenness centrality distribution, closeness centrality distribution, and eccentricity distribution:

How it works…

The diameter of a network gives us the maximum number of hops that must be made to travel from one node in the graph to the other. To find the diameter, all the shortest paths between every pair of nodes in the graph are computed and then the length of the longest of them gives us the diameter of the network. If the network is disconnected—that is, if the network has multiple components that do not share an edge between them—then the diameter of such a network is infinite.

Note that, in the case of weighted graphs, the longest path that determines the diameter of the graph is not the actual length of the path but the number of hops that would be required to traverse from the starting vertex to the end vertex.

The computation of the diameter of a graphical network makes use of a property called the eccentricity of nodes. The eccentricity of a node is a measure of the number of hops required to reach the farthest node in the graph from this node. The diameter is then the maximum eccentricity among all the nodes in the graph.

There's more…

There are three concepts—betweenness centrality, closeness centrality, and eccentricity—that have been introduced in this recipe. Eccentricity has already been covered in the How it works… section of this recipe. Betweenness centrality and closeness centrality are yet more important statistical properties of a network and are applied in a lot of real-world problems such as finding influential people in a social network, finding crucial hubs in a computer network, finding congestion nodes in wireless networks, and so on.

The betweenness centrality of a node is an indicator of its centrality or importance in the network. It is described as the number of shortest paths from all the vertices to all the other vertices in the network that pass through the node in consideration.

The closeness centrality of a node measures how accessible every other node in the graph is from the considered node. It is defined as the inverse of the sum of shortest distances of every other node in the network from the current node. Closeness centrality is an indicator of the speed at which information will transfuse into the network, starting from the current node.

Yet another concept that has been mentioned in this recipe is the radius of the graph. The radius of a graph is the opposite of its diameter. It is defined as the minimum eccentricity among the vertices of the graph. In other words, it refers to the minimum number of hops that are required to reach from one node of the graph to its farthest node.

Finding graph density

One another important statistical metric for graphs is density. In this recipe, you will learn what graph density is and how to compute it in Gephi.

How to do it…

The following steps illustrate how to use Gephi to figure out the graph density for a chosen graph:

Load the directed version of the Les Misérables network in Gephi, as described in the How to do it… section of the previous recipe.
In the Statistics panel located on the right-hand side of the Gephi application window, click on the Run button that is placed against Graph Density.
This opens up the Density settings window, as shown in the following screenshot, where you can choose between the directed or the undirected version of the graph to be considered for the computation of graph density:
Click on OK. This opens up the following Graph Density Report window:

How it works…

A complete graph is a graph in which every pair of nodes is connected via a direct edge. The density of a graph is a measure of how close the graph is to a complete graph with the same number of nodes. It is defined as the ratio of the total number of edges present in a graph to the total number of edges possible in the graph. The total number of edges possible in a simple undirected graph is mathematically computed as (N(N-1))/2, where N is the number of nodes in the graph. A simple graph is a graph that has no loops and not more than one edge between the same pair of nodes.

There's more…

The density of the undirected version of a graph with n nodes will be twice of that of the directed version of the graph. This is because, in a directed graph, there are two edges possible between every pair of nodes, each with a different direction.

Finding the HITS value for a graph

Hyperlink-Induced Topic Search (HITS) is also known as hubs and authorities. It is a link analysis algorithm and is used to evaluate the relationship between the nodes in a graph. This algorithm aims to find two different scores for each node in the graph: authority, which indicates the value of the information that the node holds, and hub, which indicates the value of the link information to the other linked nodes to this node. In this recipe, you will learn about HITS and how Gephi is used to compute this metric for a graph.

How to do it…

Considering the directed version of the Les Misérables network, the following steps describe the process of determining the HITS score for a graph in Gephi:

In Gephi's menu bar, click on Window. From the drop-down menu, select Welcome.
In the window that just opened, click on Les Miserables.gexf. This opens up another window.
In the Import Report window, select Graph Type as directed.
With the directed version of Les Misérables loaded in Gephi, click on the Run button placed next to HITS in the Network Overview tab of the Statistics panel.
This opens up the HITS settings window, as shown in the following screenshot:
Choose the graph type, Directed or UnDirected, on which you would want to run the HITS algorithm.
Enter the stopping criterion value in the Epsilon textbox. This determines the stopping point for the algorithm.
Hit OK. This opens up HITS Metric Report with a graph depicting the hub and authority distribution for the graph:

How it works…

The HITS algorithm was developed by Professor Jon Kleinberg from the department of computer science at Cornell University at around the same time as the PageRank algorithm was being developed. The HITS algorithm is a link analysis algorithm that helps in identifying the crucial nodes in a graph. It assigns two scores, a hub score and an authority score, to each of the nodes in the graph. The authority score of a node is a measure of the amount of valuable information that this node holds. The hub score of a node depicts how many highly informative nodes or authoritative nodes this node is pointing to. So a node with a high hub score shows that this node is pointing to many other authoritative nodes and hence serves as a directory to the authorities. On the other hand, a node with a high authoritative score shows that it is pointed to by a large number of nodes and hence serves as a node of useful information in the network.

One thing that you might have noticed is the Epsilon or stopping criterion for the HITS algorithm being mentioned in one of the steps of the recipe. Computation of HITS makes use of matrices and something called Eigenvalues. The value of Epsilon instructs the algorithm to stop when the difference between eigenvalues of the matrices for two consecutive iterations becomes negligibly small. The detailed discussion of eigenvalues and any mathematical treatment of the HITS algorithm are outside the scope of this article but there are some really good resources available online that explain these concepts very well. Some of these resources are also mentioned in the See also section of this recipe.

There's more…

Since its introduction, there has been a plethora of research on applications of the HITS algorithm to real-world problems such as finding pages with valuable information on the World Wide Web, a problem otherwise known as webpage ranking. There also has been intensive research on improving the time complexity of the HITS algorithm. A simple search on http://scholar.google.com/ for HITS will reveal some of the interesting research that has been, and is being, carried out in this domain.

Finding a graph's modularity

The modularity of a graph is a measure of its strength and describes how easily the graph could be disintegrated into communities, modules, or clusters. In this recipe, the concept of modularity, along with its implementation in Gephi, is described.

How to do it…

To obtain the modularity score for a graph, follow these steps:

Load the Les Misérables graph in Gephi.
In the Network Overview tab under the Statistics panel, hit the Run button adjacent to Modularity.
In the Modularity settings window, enter a resolution in the textbox depending on whether you want a small or large number of communities:

You can choose to randomize to get a better decomposition of the graph into communities, but this increases the computation time.

You can also choose to include edge weight in computing modularity.
Hit OK once done.
This opens up the Modularity Report window, which the size distribution of communities into various modularity classes. The report also shows the number of communities formed, along with the overall modularity score of the graph:

How it works…

Modularity is defined as the fraction of edges that fall within the given modules to the total number of edges that could have existed among these modules. Mathematically, modularity is computed as , where is the probability that an edge is in module i and is the probability that a random edge would fall into the module i.

Modularity is a measure of the structure of graphical networks. It determines the strength of the network as a whole. It describes how easily a network could be clustered into communities or modules.

A network with high modularity points to strong relationships within the same communities but weaker relationship across different communities. It is one of the fundamental methods used during community detection in graphs. Modularity finds its applications in a wide range of areas such as social networks, biological networks, and collaboration networks.

Finding a graph's PageRank

Just like the HITS algorithm, the PageRank algorithm is a ranking algorithm for the nodes in a graph. It was developed by the founders of Google, Larry Page and Sergey Brin, while they were at Stanford. Later on, Google used this algorithm for ranking webpages in their search results. The PageRank algorithm works on the assumption that a node that receives more links is likely to be an important node in the network. This recipe explains what PageRank actually is and how Gephi could be used to readily compute the PageRank of nodes in a graph.

How to do it…

The following steps describe the process of finding the PageRank of a graph by making use of the capabilities offered by Gephi:

Load the directed version of the Les Misérables network into Gephi.
In the Settings panel, under the Network Overview tab, click on the Run button placed against PageRank.
This opens up the PageRank settings window as shown in the following screenshot:
Choose which version, Directed or UnDirected, you want to use for computing the PageRank.
In the Probability textbox, enter the initial probability value that would serve as the starting PageRank for each of the nodes in the graph.
Enter the stopping criterion value in the Epsilon textbox. The smaller the value of the stopping criterion, the longer the PageRank algorithm will take to converge.
You can choose to include or leave out the edge weight from the computation of the PageRank.
Hit OK once done.
This opens up a new window titled PageRank Report depicting the distribution of the PageRank score over a graph.

The following screenshot shows the distribution of PageRank in the directed Les Misérables network with the initial probability as 0.85 and the epsilon value as 0.001:

running-metrics-filters-and-timelines-img-19

How it works…

The PageRank algorithm, like the HITS algorithm, is a link analysis algorithm and aims to rank the nodes of a graph according to their importance in the network. The PageRank for a node is a measure of the likelihood of arriving at this node starting from any other node in the network through non-random graph traversal.

The PageRank algorithm has found its applications in a wide range of areas including social network analysis, webpage ranking on World Wide Web, search engine optimization, biological networks, chemistry, and so on.

Summary

In this article, we learned how to select a list of metrics for a graph. Then, we explained how to find the average degree as well as the average weighted degree of a graph with the help of Gephi. We also learned how we can use the capabilities of Gephi in order to find the diameter of a network, graph density, and graph modularity. The HITS algorithm and how Gephi is used to compute this metric for a graph were also covered in detail. Finally, we learned about the PageRank algorithm and how Gephi could be used to readily compute the PageRank of nodes in a graph.

Enjoy your journey exploring more with Gephi!