Network visualization has become an increasingly important approach for how we view data in our increasingly connected world. Social networks, information networks, transportation networks, and a host of other datasets can be brought to life through network maps. However, this approach was traditionally left to those with an understanding of the complex mathematical underpinnings of graph theory, or at least to those who were exceptional coders who could create their own graph structures. In recent years, the explosion of social media datasets has propelled network graphs into the visualization mainstream, resulting in a number of proprietary and open source tools that address the need to create and view networks. One of the leading tools of this genre is Gephi.
The goal of Gephi is to make network visualization accessible to all by providing a set of tools that handle the complex mathematics supporting the graphs. Therefore, users of Gephi are able to focus on the meaning of the underlying data, and may quickly test alternative visual approaches that best display the network connections the user wishes to share with his/her audience. Gephi is a great tool for those wanting to display insights through their graphs quickly, while simultaneously providing the ability to further explore the world of graph theory.
In this chapter, I will provide you with instructions on how to quickly get up and running with Gephi. You will learn how to:
Go to the Gephi website
Ensure you have the necessary software and hardware to run Gephi
Download the current version of the software
Run the installation process on your laptop or desktop
Start the software and verify that it is functioning correctly
Locate and understand the functionality for each of the toolbar icons
Navigate to the Gephi site at http://gephi.org. Take a moment while you visit the site to acquaint yourself with some of the available resources, including quick tips on how the software can be used to create a wide variety of network maps and diagrams.
The download option will provide information about the current version. We will be working from the 0.8.2-beta version for each of the examples in this book.
On the landing page, beneath the download button is a link for system requirements. Click on it to make certain you have the necessary resources to run the Gephi platform successfully. After clicking, you should see something like this:
As you will note on this page, the practical requirements for running Gephi are highly related to the complexity of the datasets that will form your network maps. This is due in part to the graphic-intensive nature of the program; simple diagrams will be computed without difficulty on a machine with just 128 MB of RAM. However, if you are looking to create and analyze complex networks with hundreds of thousands of nodes and edges, it would be wise to use a machine with at least 2 GB of memory.
Note that the examples used in this book will not require significant memory or CPU outlays. So, as long as your machine is capable of handling a few thousand nodes and edges, there should be no issues in executing any of the cases presented here.
Once you have verified the minimal requirements needed to run the software, it's time to begin the download process.
Click the download button, and you will be transported to the download page, with multiple download options, depending on your operating system. Any language and localization preferences you may have can be set after the download. Gephi offers English, French, Spanish, Japanese, Brazilian Portuguese, Russian, Chinese, and Czech localization using the Languages option under the Tools menu. All of the examples used in this book are based on the Windows version using the English language setting.
Once the appropriate software version has been downloaded, locate it on your machine to begin the installation process. Gephi is packaged using an executable setup file, so you should find a filename akin to
gephi-0.8.2-beta.setup.exe, depending on the version selected for download.
Click on the downloaded file to start the installation process. Depending on your machine settings, you may encounter a pop-up message asking if you wish to continue with the installation. Simply select the Run option to proceed with the setup.
After navigating through any system messages, the Gephi installation process will initiate, starting with a Setup Wizard window. Click on the Next button to continue.
The setup program will then prompt you for the installation location, defaulting to
C:\Program Files (x86)\Gephi-0.8.2. Click on the Next button that takes you to a Start menu's folder option. The default setting here is set to Gephi (if that's acceptable to you), and then select Next; otherwise, provide an alternative name, such as
Awesome Network Viz Tool, and click on Next to proceed.
Next, in a seemingly endless series of options, (don't worry, they're almost done!), is a window where Gephi allows you to set a few file defaults. This is valuable if you are importing data from other network visualization tools, such as Cytoscape or Graphviz.
Choose your settings, and the installation wizard provides one last settings confirmation window before starting the install. If you are comfortable with the choices you just made, click on Install. Otherwise, select the Back button to adjust the settings before completing the installation.
Finally, we're ready for the big event! Click on Install, and we're off to the races. The process will typically take less than two minutes, depending on the speed of your local machine. When the installation is successful, Gephi will show the following window:
Now that the install has completed, you'll note that Gephi has given us the option to launch the program immediately. At this point, you probably would like to dig into everything Gephi can do as quickly as possible. So, let's go, keeping the Launch Gephi option checked, and select Finish.
Before moving on to a discussion of Gephi tools and capabilities, let's step back for a moment and make sure we understand the component parts of a network graph. This knowledge is essential to a basic understanding of creating and interpreting network graphs.
Quite simply, graphs are composed of nodes and edges. Nodes are a set of objects representing entities in a dataset. You may think of cities, universities, students, and so on as typical nodes. Edges are the connections between nodes, and provide visual cues as to the degree of connectedness of the graph. Not all nodes are connected to one another; those that are directly connected are referred to as neighbors . Note that a single node may have many edges, depending on the number of neighbors associated with a given node.
Finally, edges in Gephi may be directed or undirected. In most cases, graphs are undirected, meaning there is a symmetrical connection between nodes. A directed graph expresses asymmetric relationships, where there is a specific order between points, typically represented by an arrow pointing from a source node to a target node.
In Gephi, nodes and edges are both represented by specific identifiers, such as ID and Label, and may contain other more descriptive information, such as weight, subgroup, and color.
You will see a Welcome window on top of the general user interface. Note that you can elect to not view the Welcome window on subsequent visits by deselecting the checkbox at the lower-left corner of the window. The window provides a few quick startup options, including a handful of sample visualizations provided as part of the installation package.
For now, I'm going to ask you to close this window so we can begin to examine the different components within the base Gephi install. For those of you familiar with the NetBeans IDE, the Gephi layout will be easy to grasp, because it is based on the NetBeans environment. Even if you aren't familiar with NetBeans or Eclipse, you will soon find the user interface to be quite intuitive, and will be navigating between panels and menus flawlessly. Menus and formatting options are laid out around the perimeter of the Gephi workspace, with the main area designed for viewing your data and graphs. This approach keeps almost everything in front of you at all times, and will help you to become familiar with the Gephi layout quickly.
Let's begin our tour by examining the default layout structure. As you become more familiar with Gephi and how you work with it, there will be opportunities to customize the layout. For now, let's stick with the defaults, starting with the main canvas area in the center of the window.
If you've already taken the opportunity to scan the Gephi workspace, most likely you have noticed that all of the options are grayed out. To rectify that, we'll need to open a file.
To get started, you'll want to download some of the sample files from the Gephi site and keep them in an easy to find location. For this example, we'll work with the Jazz musicians' network, a collection of jazz figures and their relationships to one another.
Go to the Gephi wiki at https://wiki.gephi.org/index.php/Datasets and navigate to the dataset, which can be found under the Social Networks heading. Alternatively, you can go straight to the following address at https://gephi.org/datasets/jazz.net.zip. Note that this file is a
.zipfile, so you'll need to extract it before we can load it into Gephi.
Now, select Open from the File menu, and then navigate to the file you just downloaded. I've chosen this example for two reasons: first, it will allow us to tour the basic features in Gephi, and second, you will get a glimpse of a relatively simple network before we move on to a greater degree of complexity later in the book.
Actually, it looks quite complex at first glance, with a couple hundred nodes (the circles in the diagram) and the thousands of edges (lines) connecting the related nodes. Fear not, you'll soon find this level of complexity easy to navigate. For now, let's leave the map aside and focus on the many tools Gephi offers to help us make sense out of this or any other network map. The next few sections will provide brief overviews of the functionality provided within Gephi. In later chapters, some of these options will be explored in greater detail.
At the top we have the direct selection arrow, the default, meaning you can click on any node or edge within a diagram. This is followed by a rectangle selection tool, which allows you to draw a rectangle of any size over portions of the diagram, highlighting items in the selected area as you go.
The drag tool, represented by a hand shape, permits you to get a closer look at selected nodes by dragging them to a new location on the canvas. This can be very useful in cases where we have a densely populated canvas area.
If you still need a way to highlight specific elements in your diagram, the next two icons on the toolbar are highly useful, starting with the painter tool that enables us to recolor selected nodes with a simple mouse click. To really focus on a specific node, you can take advantage of the sizer tool, allowing us to increase or decrease the node size simply by moving our mouse up or down.
The brush tool provides another great way to understand the underlying relationships in our visualization. Selecting this tool, we can click on any node and see all the nodes in the diagram that have a direct relationship to our selection. As we'll see in a bit, we can do this in any color we choose, thus making it much easier to see patterns, especially when compared to our original example.
The next two options on the toolbar allow us to draw new nodes with the node pencil, or to add new edges through the edge pencil. Each of these is executed by using simple mouse clicks. In many cases, there will be no need to add nodes or edges in this fashion, but it is nice to have that capability for those occasions where it is needed.
Next, you will see an airplane icon, representing the shortest path function, a great tool for seeing how many connections it takes to navigate between two selected nodes. Simply highlight this icon, then click on two distinct nodes, and watch Gephi display the most direct connection between the two points. In some cases, Gephi will inform us that there are no available connections between the points.
Another useful option comes by way of the heatmap tool, depicted as a gear-shaped icon on the toolbar. This function enables you to see the proximity of related nodes using either a gradient or color palette approach. This provides an effective way to see relationships across the network for a selected node.
Finally, we have the edit icon, represented by a selection arrow with an adjacent question mark. Selecting this option allows us to view several attributes for our selection, including color, size, x, y, and z positioning, plus ID and label information. Any of these attributes may be edited here, with the exception of the ID field, which remains fixed.
Locate the icon that looks like a magnifying glass. This is the center on graph function, which does exactly what the name specifies. It places our graph back in the center of the workspace. Nothing fancy here, but it can be a very useful function.
Beneath this is the reset colors icon. Tired of the basic default gray color for the graph? Simply right-click on this icon, select a color from the wheel, and then left-click to reset the graph to your new color. Play around with this until you get a color that suits your fancy, or perhaps matches your decorating scheme.
The next three icons all relate to labeling the diagram, starting first with the reset label color , followed by reset label visible , and finally reset label size. Resetting the label color returns labels to black, while resetting the labels to visible does just that. Finally, resetting the label size returns all labels to the default size setting. These functions will be explored further on as we proceed with our examples in upcoming chapters.
The background color option appears as a light bulb. This feature provides a simple toggle between a white background versus a black one. Use your discretion to determine which one looks better—although a dark background may provide a more dynamic appearance, the white one is often a better option for printing purposes.
We next encounter three functions related to how edges are displayed. First, we have the show edges icon that works exactly as expected. Select this option, and all edges will disappear, making it easier to see the distribution of nodes. Click on it again, and the edges reappear. The second icon in this grouping is used to set edges to source node color; toggle this to match the node color or to return to the default edge color. Our third and final function in this trio is to show edge labels, used in the same fashion as the show node labels option.
Finally, we come to the final group of options, all related to formatting our network display. The edge weight scale is a slider bar; drag it to the left for thinner lines, to the right to increase the weight of the edges. The next two options are each represented by an upper case "A", one with black text and the second in blue. The initial icon is the size mode function, which lets us set labeling options as
scaled, or based on the node size. We'll see how this works when we dive into creating our own graphics. The color mode can be set at either an object or unique level for the purpose of highlight graph features. The font option in this group lets us set the base font and its size. The font size scale function is a slider bar where we can adjust the size relative to the base settings. Slide to the right for a larger font size, and to the left for a smaller one. Please note that either the node or edge labels will need to be turned on to see any impact from this function.
Just two more options before we move on, and put some of this functionality to use based on our own clever visualizations. First is the default color icon, shown as a colored square, based on the current default color setting. Selecting and holding this icon opens a color spectrum window where you can create the color of your dreams, and set it as the default. The attributes settings option allows us to set the default labeling options for both, nodes and edges, using IDs, labels, or both.
Still with me? I hope you're beginning to grasp the capabilities of Gephi, even as we took a very high-level tour of the basic functionality. As we continue our journey, we'll unleash more of this power and begin to tap into the full potential of this great tool.
In this chapter, you learned how to download Gephi to your local desktop or laptop, and how to initiate the installation process. You also learned how to configure the installation options, including the default file settings to be used with your version of Gephi.
Also, you should have a base-level familiarity with the Gephi workspace, especially the design toolbar. Although you may not yet understand in detail how each of these functions work, you should be prepared to move on and begin working with some of the functionalities as we proceed to the next chapter. In our next chapter, you'll discover how to create your own graphs using a number of the tools about which you just learned.