Network Graph Analysis and Visualization with Gephi — Save 50%
Visualize and analyze your data swiftly using dynamic network graphs built with Gephi with this book and ebook
Gephi is a versatile and powerful tool that will help you create simple network visualizations quickly, while also providing the capabilities to build complex graphs based on large datasets. In this article by Ken Chevren, author of Network Graph Analysis and Visualization with Gephi, you will learn some of the fundamentals of Gephi and network visualization, which will rapidly empower you to create your own graphs.
In this article we will cover a few basic graph theory concepts and then take a brief look at some of the standard Gephi layout algorithms. This will provide you with the essentials to understand network graphs and how to quickly create them using Gephi.
(For more resources related to this topic, see here.)
Basic network graph terminology
Network graphs are essentially based on the construct of nodes and edges. Nodes represent points or entities within the data, while edges refer to the connections or lines between nodes. Individual nodes might be students in a school, or schools within an educational system, or perhaps agencies within a government structure.
Individual nodes may be represented through equal sizes, but can also be depicted as smaller or larger based on the magnitude of a selected measure. For example, a node with many connections may be portrayed as far larger and thus more influential than a sparsely connected node. This approach will provide viewers with a visual cue that shows them where the highest (and lowest) levels of activity occur within a graph.
Nodes will generally be positioned based on the similarity of their individual connections, leading to clusters of individual nodes within a larger network. In most network algorithms, nodes with higher levels of connections will also tend to be positioned near the center of the graph, while those with few connections will move toward the perimeter of the display.
Edges are the connections between nodes, and may be displayed as undirected or directed paths. Undirected relationships indicate a connection that flows in both directions, while a directed relationship moves in a single direction that always originates in one node and moves toward another node. Undirected connections will tend to predominate in most cases, such as in social networks where participant activity flows in both directions. On occasion, we will see directed connections, as in the case of some transportation or technology networks where there are connections that flow in a single direction.
Edges may also be weighted, to show the strength of the connection between nodes. In the case where University A has performed research with both University B and University C, the strength (width) of the edge will show viewers where the stronger relationship exists. If A and B have combined for three projects, while A and C have collaborated on 9 projects, we should weight the A to C connection three times that of the A to B connection.
Another commonly used term is neighbors, which is nothing more than a node that is directly connected to a second node. Neighbors can be stated to be one degree apart. Degrees is the term used to refer to the number of connections flowing into (or away from) a node (also known as Degree Centrality), as well as to the number of connections required to connect to another node via the shortest possible path. In complex graphs, you may find nodes that are four, five, or even more degrees away from a distant node, and in some cases two nodes may not be connected at all.
Now that you have a very basic understanding of network graph theory, let's learn about some of the common network graph algorithms.
Common network graph algorithms
Before we introduce you to some specific graph algorithms, we'll briefly discuss some of the theory behind network graphs and introduce you to a few of the terms you will frequently encounter.
Network graphs are drawn through positioning nodes and their respective connections relative to one another. In the case of a graph with 8 or 10 nodes, this is a rather simple exercise, and could probably be drawn rather accurately without the help of complex methodologies. However, in the typical case where we have hundreds of nodes with thousands of edges, the task becomes far more complex.
Some of the more prominent graph classes in Gephi include the following:
- Force-directed algorithms refer to a category of graphs that position elements based on the principles of attraction, repulsion, and gravity
- Circular algorithms position graph elements around the perimeter of one or more circles, and may allow the user to dictate the order of the elements based on data properties
- Dual circle layouts position a subset of nodes in the interior of the graph with the remaining nodes around the diameter, similar to a single circular graph
- Concentric layouts arrange the graph nodes using an approximately circular graph design, with less connected nodes at the perimeter of the graph and highly connected nodes typically near the center
- Radial axis layouts provide the user with the ability to determine some portion of the graph layout by defining graph attributes
The type of graph you select may well be dictated by the sort of results you seek. If you wish to feature certain groups within your dataset, one of the layouts that allows you to segment the graph by groups will provide a potentially quick solution to your needs. In this instance, one of the circular or radial axis graphs may prove ideal.
On the other hand, if you are attempting to discover relationships in a complex new dataset, one of the several available Force-directed layouts is likely a better choice. These algorithms will rely on the values in your dataset to determine the positioning within the graph. When choosing one of these approaches, please note that there will often be an extensive runtime to calculate the graph layout, especially as the data becomes more complex. Even on a powerful computer, examples may run for minutes or hours in an attempt to fully optimize the graph. Fortunately, you will have the ability in Gephi to stop these algorithms at any given point, and you will still have a very reasonable, albeit imperfect graph.
In the next section, we'll look at a few of the standard layouts that are part of the Gephi base package.
Standard network graph layouts
Now that you are somewhat familiar with the types of layout algorithms, we'll take a look at what Gephi offers within the Layout tab. We'll begin with some common Force-directed approaches, and then examine some of the other choices.
One of the best known force algorithms is Force Atlas, which in Gephi provides users with multiple options for drawing the graph. Foremost among these settings are Repulsion, Attraction, and Gravity settings. Repulsion strength adjustments will make individual nodes either more or less sensitive to other nodes they differ from. A higher repulsion level, for example, will push these nodes further apart. Conversely, setting the Attraction strength higher will force related nodes closer together. Finally, the Gravity setting will draw nodes closer to the center of the graph if it is set to a high level, and disperse them toward the edges if a very low value is set.
Force Atlas 2 is another layout option that employs slightly different parameters than the original Force Atlas method. You may wish to compare these methods and determine which one gives you better results.
Fruchterman Reingold is one more Force method; albeit one that provides you with just three parameters – Area, Gravity, and Speed. While the general approach is not unlike the Force Atlas algorithms, your results will appear different in a visual sense.
Finally, Gephi provides three Yifan Hu methods. Each of these models—Yifan Hu, Yifan Hu Proportional, and Yifan Hu Multilevel, are likely to run much more rapidly than the methods discussed earlier, while providing generally similar results.
Gephi also provides a variety of methods that do not employ the force approach. Some of the models, as we noted earlier in this article, provide you with more control over the final result. This may be the result of selecting how to order the nodes, or of which attributes to use in grouping nodes together, either through color or location.
In the section above, I referenced several layout options, but in the interest of space we'll take a closer look at two of them—the Circular and Radial Axis layouts.
Circular layouts are well suited to relatively small networks, given the limited flexibility of their fixed layout. We can adjust this to some degree by specifying the diameter of the graph, but anything more than a few dozen well-connected nodes often becomes difficult to manage. However, with smaller networks, these layouts can be intriguing, providing us with the ability to see patterns within and between specific groups more easily than we might see them in some other layouts.
While this article will not cover any filtering options, those too can be used to help us better utilize the circular layouts, by providing us with the ability to highlight specific groups and their resulting connections. Think of the circle resembling a giant spider web filled with connections, and the filters as tools that help us see specific threads within the web.
Our final notes are on Radial Axis layouts, which can provide us with fascinating looks at our data, especially if there are natural groups within the network. Think of a school with several classrooms full of students, for example. Each of these classrooms can be easily identified and grouped, perhaps by color. In a complex force directed graph we may be able to spot each of these groups, but it may become difficult due to the interactions with other classes. In a Radial Axis layout we can dictate the group definitions, forcing each group to be bound together, apart from any other groups.
There are pros and cons to this approach, of course, as there are with any of the other methods. If we wish to understand how a specific group interacts with another group, this method can prove beneficial, as it isolates these groups visually, making it easier to see connections between them. On the negative side, it is often quite difficult to see connections between members within the group, due to the nature of the layout. As with any layouts, it is critical to look at the results and see how they apply to our original need. Always test your data using multiple layout algorithms, so that you wind up with the best possible approach.
Gephi is an ideal tool for users new to network graph analysis and visualization, as it provides a rich set of tools to create and customize network graphs. The user interface makes it easy to understand basic concepts such as nodes and edges, as well as descriptive terminology such as neighbors, degrees, repulsion, and attraction. New users can move as slowly or as rapidly as they wish, given Gephi's gentle learning curve.
Gephi can also help you see and understand patterns within your data through a variety of sophisticated graph methods that will appeal to both the novice as well as seasoned users. The variety of sophisticated layout algorithms will provide you the opportunity to experiment with multiple layouts as you search for the best approach to display your data. In short, Gephi provides everything needed to produce first-rate network visualizations.
Resources for Article:
- OpenSceneGraph: Advanced Scene Graph Components [Article]
- Cacti: Using Graphs to Monitor Networks and Devices [Article]
- OpenSceneGraph: Managing Scene GraphOpenSceneGraph: Managing Scene Graph [Article]
eBook Price: $19.99
Book Price: $32.99
About the Author :
Ken Cherven is a marketing analyst working in the automotive sector in Detroit, Michigan, USA. He has more than 15 years' experience working with proprietary tools from Microsoft, Cognos, Tableau, and Oracle, in addition to extensive experience using a variety of open source software applications including MySQL, SpagoBI, JasperServer, BIRT, Mondrian, R, Gephi, Exhibit, Omeka, and d3.
Ken also maintains the visual-baseball.com site, where he uses available open source and proprietary tools to analyze, report on, and visualize baseball information. The site features many of his baseball visualization projects, including a collection of more than 100 seasons of interactive pennant race charts.
One of Ken's current projects is to publish a visual history of major league baseball pennant races from 1901 through 2012, using a dashboard approach featuring horizon charts, box plots, bullet charts, and other visuals to tell the story of each and every race in a highly visual fashion. This book is scheduled for a 2013 release.