





















































In this article by Devangana Khokhar, the author of Gephi Cookbook, we'll learn about the statistical properties of graphical networks and how they can exploit these properties with the help of Gephi.
Gephi provides some ready-to-use ways to study the statistical properties of graphical networks. These statistical properties include the properties of the network as a whole, as well as individual properties of nodes and edges within the network. This article will enable you to learn some of these properties and how to use Gephi to explore them. So let's get started!
(For more resources related to this topic, see here.)
Gephi offers a wide variety of metrics for exploring graphs. These metrics allow users to explore graphs from various perspectives. In this recipe, we will learn how to access these different metrics for a specified graph.
Load a graph of your choice in Gephi.
To view different metrics available in Gephi for exploring a graph, follow these steps:
The degree of a node in a graph is defined as the number of edges that are incident on that node. The loops—that is, the edges that have the same node as their starting and end point—are counted twice. In this recipe, we will learn how to find the average degree and average weighted degree for a graph.
The following steps illustrate the process to find the average degree and weighted degree of a graph:
The graph in the preceding screenshot depicts the degree distribution for the Les Misérables network. This pop-up window has options for printing, copying, and/or saving the degree report.
The average degree of the Les Misérables network is now displayed in the Statistics panel beside the Run button for Average Degree, as shown in the following screenshot:
This will open up a window containing the weighted degree report of the Les Misérables network, as shown in the following screenshot:
The average weighted degree of the Les Misérables graph is now also displayed in the Statistics panel that is adjacent to the Run button for Avg. Weighted Degree:
The average degree for a graph is the measure of how many edges there are in the graph compared to its number of vertices. To find out the average degree for a graph, Gephi computes the sum of the degrees of individual nodes in the graph and divides that by the number of nodes present in it.
To find the average weighted degree for a graph with weighted edges, Gephi computes the average mean of the sum of the weights of the incident edges on all the nodes in the graph.
If you have closed the report window and wish to see it once again, click on the small button with a question mark adjacent to the Run button. This will reopen the degree report.
The diameter of a network refers to the length of the longest of all the computed shortest paths between all pair of nodes in the network.
The following steps describe how to find the diameter of a network using the capabilities offered by Gephi:
The diameter of a network gives us the maximum number of hops that must be made to travel from one node in the graph to the other. To find the diameter, all the shortest paths between every pair of nodes in the graph are computed and then the length of the longest of them gives us the diameter of the network. If the network is disconnected—that is, if the network has multiple components that do not share an edge between them—then the diameter of such a network is infinite.
Note that, in the case of weighted graphs, the longest path that determines the diameter of the graph is not the actual length of the path but the number of hops that would be required to traverse from the starting vertex to the end vertex.
The computation of the diameter of a graphical network makes use of a property called the eccentricity of nodes. The eccentricity of a node is a measure of the number of hops required to reach the farthest node in the graph from this node. The diameter is then the maximum eccentricity among all the nodes in the graph.
There are three concepts—betweenness centrality, closeness centrality, and eccentricity—that have been introduced in this recipe. Eccentricity has already been covered in the How it works… section of this recipe. Betweenness centrality and closeness centrality are yet more important statistical properties of a network and are applied in a lot of real-world problems such as finding influential people in a social network, finding crucial hubs in a computer network, finding congestion nodes in wireless networks, and so on.
The betweenness centrality of a node is an indicator of its centrality or importance in the network. It is described as the number of shortest paths from all the vertices to all the other vertices in the network that pass through the node in consideration.
The closeness centrality of a node measures how accessible every other node in the graph is from the considered node. It is defined as the inverse of the sum of shortest distances of every other node in the network from the current node. Closeness centrality is an indicator of the speed at which information will transfuse into the network, starting from the current node.
Yet another concept that has been mentioned in this recipe is the radius of the graph. The radius of a graph is the opposite of its diameter. It is defined as the minimum eccentricity among the vertices of the graph. In other words, it refers to the minimum number of hops that are required to reach from one node of the graph to its farthest node.
One another important statistical metric for graphs is density. In this recipe, you will learn what graph density is and how to compute it in Gephi.
The following steps illustrate how to use Gephi to figure out the graph density for a chosen graph:
A complete graph is a graph in which every pair of nodes is connected via a direct edge. The density of a graph is a measure of how close the graph is to a complete graph with the same number of nodes. It is defined as the ratio of the total number of edges present in a graph to the total number of edges possible in the graph. The total number of edges possible in a simple undirected graph is mathematically computed as (N(N-1))/2, where N is the number of nodes in the graph. A simple graph is a graph that has no loops and not more than one edge between the same pair of nodes.
The density of the undirected version of a graph with n nodes will be twice of that of the directed version of the graph. This is because, in a directed graph, there are two edges possible between every pair of nodes, each with a different direction.
Hyperlink-Induced Topic Search (HITS) is also known as hubs and authorities. It is a link analysis algorithm and is used to evaluate the relationship between the nodes in a graph. This algorithm aims to find two different scores for each node in the graph: authority, which indicates the value of the information that the node holds, and hub, which indicates the value of the link information to the other linked nodes to this node. In this recipe, you will learn about HITS and how Gephi is used to compute this metric for a graph.
Considering the directed version of the Les Misérables network, the following steps describe the process of determining the HITS score for a graph in Gephi:
The HITS algorithm was developed by Professor Jon Kleinberg from the department of computer science at Cornell University at around the same time as the PageRank algorithm was being developed. The HITS algorithm is a link analysis algorithm that helps in identifying the crucial nodes in a graph. It assigns two scores, a hub score and an authority score, to each of the nodes in the graph. The authority score of a node is a measure of the amount of valuable information that this node holds. The hub score of a node depicts how many highly informative nodes or authoritative nodes this node is pointing to. So a node with a high hub score shows that this node is pointing to many other authoritative nodes and hence serves as a directory to the authorities. On the other hand, a node with a high authoritative score shows that it is pointed to by a large number of nodes and hence serves as a node of useful information in the network.
One thing that you might have noticed is the Epsilon or stopping criterion for the HITS algorithm being mentioned in one of the steps of the recipe. Computation of HITS makes use of matrices and something called Eigenvalues. The value of Epsilon instructs the algorithm to stop when the difference between eigenvalues of the matrices for two consecutive iterations becomes negligibly small. The detailed discussion of eigenvalues and any mathematical treatment of the HITS algorithm are outside the scope of this article but there are some really good resources available online that explain these concepts very well. Some of these resources are also mentioned in the See also section of this recipe.
Since its introduction, there has been a plethora of research on applications of the HITS algorithm to real-world problems such as finding pages with valuable information on the World Wide Web, a problem otherwise known as webpage ranking. There also has been intensive research on improving the time complexity of the HITS algorithm. A simple search on http://scholar.google.com/ for HITS will reveal some of the interesting research that has been, and is being, carried out in this domain.
The modularity of a graph is a measure of its strength and describes how easily the graph could be disintegrated into communities, modules, or clusters. In this recipe, the concept of modularity, along with its implementation in Gephi, is described.
To obtain the modularity score for a graph, follow these steps:
You can choose to randomize to get a better decomposition of the graph into communities, but this increases the computation time.
You can also choose to include edge weight in computing modularity.
Modularity is defined as the fraction of edges that fall within the given modules to the total number of edges that could have existed among these modules. Mathematically, modularity is computed as , where
is the probability that an edge is in module i and
is the probability that a random edge would fall into the module i.
Modularity is a measure of the structure of graphical networks. It determines the strength of the network as a whole. It describes how easily a network could be clustered into communities or modules.
A network with high modularity points to strong relationships within the same communities but weaker relationship across different communities. It is one of the fundamental methods used during community detection in graphs. Modularity finds its applications in a wide range of areas such as social networks, biological networks, and collaboration networks.
Just like the HITS algorithm, the PageRank algorithm is a ranking algorithm for the nodes in a graph. It was developed by the founders of Google, Larry Page and Sergey Brin, while they were at Stanford. Later on, Google used this algorithm for ranking webpages in their search results. The PageRank algorithm works on the assumption that a node that receives more links is likely to be an important node in the network. This recipe explains what PageRank actually is and how Gephi could be used to readily compute the PageRank of nodes in a graph.
The following steps describe the process of finding the PageRank of a graph by making use of the capabilities offered by Gephi:
The following screenshot shows the distribution of PageRank in the directed Les Misérables network with the initial probability as 0.85 and the epsilon value as 0.001:
The PageRank algorithm, like the HITS algorithm, is a link analysis algorithm and aims to rank the nodes of a graph according to their importance in the network. The PageRank for a node is a measure of the likelihood of arriving at this node starting from any other node in the network through non-random graph traversal.
The PageRank algorithm has found its applications in a wide range of areas including social network analysis, webpage ranking on World Wide Web, search engine optimization, biological networks, chemistry, and so on.
In this article, we learned how to select a list of metrics for a graph. Then, we explained how to find the average degree as well as the average weighted degree of a graph with the help of Gephi. We also learned how we can use the capabilities of Gephi in order to find the diameter of a network, graph density, and graph modularity. The HITS algorithm and how Gephi is used to compute this metric for a graph were also covered in detail. Finally, we learned about the PageRank algorithm and how Gephi could be used to readily compute the PageRank of nodes in a graph.
Enjoy your journey exploring more with Gephi!
Further resources on this subject: