Reader small image

You're reading from  Network Science with Python and NetworkX Quick Start Guide

Product typeBook
Published inApr 2019
Reading LevelIntermediate
PublisherPackt
ISBN-139781789955316
Edition1st Edition
Languages
Concepts
Right arrow
Author (1)
Edward L. Platt
Edward L. Platt
author image
Edward L. Platt

Edward L. Platt creates technology for communities and communities for technology. He is currently a researcher at the University of Michigan School of Information and the Center for the Study of Complex Systems. He has published research on large-scale collective action, social networks, and online communities. He was formerly a staff researcher at the MIT Center for Civic Media. He contributes to many free/open source software projects, including tools for media analysis, network science, and cooperative organizations. He has also done research on quantum computing and fault tolerance. He has an M.Math in Applied Mathematics from the University of Waterloo, as well as B.S degrees in both Computer Science and Physics from MIT.
Read more about Edward L. Platt

Right arrow

From Data to Networks

To analyze a system using NetworkX, that system must first be modeled as a network, and then be represented as an object within NetworkX. This chapter explains the basic process of creating network representations of data. The first section covers the part of the process that takes place in your head: modeling data as a network. The remaining sections demonstrate the part of the process that happens in code: creating a NetworkX Graph from data, using two different methods. In the first method, data is reformatted into one of the standard network formats supported by NetworkX. In the second method, for more complex data, a network is created from scratch, by using code to add nodes and edges one at a time.

In this chapter, we will cover the following topics:

  • Modeling data: Giving meaning to nodes and edges
  • Network files: Saving your networks to files
  • Networks...

Modeling your data

When representing data as a network, there are many decisions to make along the way. Different types of networks are helpful for understanding different types of data and for asking different types of questions. This section will take a closer look at some of the important considerations.

When creating a network from data, one of the most important questions to consider is what exactly the nodes and edges should represent. Often there are many possibilities, even for the same dataset. Any particular choice focuses on some aspects of the data, possibly discarding others. Networks are fundamentally about relationships and connections, so one way to define nodes and edges is to think about what types of relationships you're interested in. Some possibilities include the following:

  • Social relationships, such as friendships, romantic relationships, or even rivalries...

Reading and writing network files

NetworkX provides support for reading and writing many network file formats. Of course, if a network has been provided in one of these formats, it will be very easy to load into NetworkX! But, even if you have data in another format, it is often possible to convert it to one of the supported formats without too much difficulty (I would guess that 90% of network science work is converting data between formats most of the rest is complaining about converting data). Spreadsheets, for instance, can often be converted to an appropriate format just by reordering columns and exporting as tab-separated values (TSV format). This section will describe several common formats, including adjacency list, edge list, GEXF, and JSON.

The edge list format is a simple but useful plain-text format. It supports edge attributes, but not node attributes. Edge lists...

Creating a network with code

So far, you've got some handy network formats in your toolbox. But, if your data is too complex or too messy to easily convert into one of the previous formats, you might have to build your network from scratch, adding edges and nodes one at a time. Luckily, the techniques you learned in Chapter 2, Working with Networks in NetworkX, are all you really need! This section walks through a practical example of building a network programmatically from a real data set.

The example in this section is a word co-occurrence network. These networks are used to understand the relationship between words in a particular set of documents. In a co-occurrence network, nodes represent words and edge weights represent how many documents they appear in together. Here, "document could mean any collection of words: blog post, paragraph, sentence, carefully arranged...

Summary

This chapter has demonstrated the process of getting data into NetworkX for analysis. This chapter discussed the types of questions that are important to consider when creating a network from data, and applied them to the example of Wikipedia. This chapter also gave examples of loading networks from standard formats and building networks from scratch. The next chapter introduces affiliation networks—those with two types of nodes.

References

The following is a list of resources that you can consider to get further knowledge:

  • Shelley, Mary Wollstonecraft. (1818). Frankenstein; or The Modern Prometheus. Urbana, Illinois: Project Gutenberg. Retrieved February 21, 2016, from www.gutenberg.org/ebooks/19033.
lock icon
The rest of the chapter is locked
You have been reading a chapter from
Network Science with Python and NetworkX Quick Start Guide
Published in: Apr 2019Publisher: PacktISBN-13: 9781789955316
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Edward L. Platt

Edward L. Platt creates technology for communities and communities for technology. He is currently a researcher at the University of Michigan School of Information and the Center for the Study of Complex Systems. He has published research on large-scale collective action, social networks, and online communities. He was formerly a staff researcher at the MIT Center for Civic Media. He contributes to many free/open source software projects, including tools for media analysis, network science, and cooperative organizations. He has also done research on quantum computing and fault tolerance. He has an M.Math in Applied Mathematics from the University of Waterloo, as well as B.S degrees in both Computer Science and Physics from MIT.
Read more about Edward L. Platt