Reader small image

You're reading from  Python Data Science Essentials. - Third Edition

Product typeBook
Published inSep 2018
Reading LevelIntermediate
PublisherPackt
ISBN-139781789537864
Edition3rd Edition
Languages
Concepts
Right arrow
Author (1)
Alberto Boschetti
Alberto Boschetti
author image
Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti

Right arrow

Social Network Analysis

Social network analysis, usually referred to as SNA, creates a model and studies the relationships of a group of social entities that exist in the form of a network. An entity can be a person, a computer, or a web page, and a relationship can be a like, link, or friendship (that is, a connection between entities).

In this chapter, you'll learn about the following:

  • Graphs, since social networks are usually represented in this form
  • Important algorithms that are used to gain insights from a graph
  • How to load, dump, and sample large graphs

Introduction to graph theory

Basically, a graph is a data structure that's able to represent relations in a collection of objects. Under this paradigm, the objects are the graph's nodes and the relations are the graph's links (or edges). The graph is directed if the links have an orientation (conceptually, they're like the one-way streets of a city); otherwise, the graph is undirected. In the following table, examples of well-known graphs are provided:

Graph example

Type

Nodes

Edges

World Wide Web

Directed

Web pages

Links

Facebook

Undirected

People

Friendship

Twitter

Directed

People

Follower

IP network

Undirected

Hosts

Wires/Connections

Navigation systems

Directed

Places/Addresses

Streets

Wikipedia

Directed

Pages

Anchor links

Scientific literature

Directed

Papers

Citations

Markov...

Graph algorithms

To get insights from graphs, many algorithms have been developed. In this chapter, we'll use a well-known graph in NetworkX, that is, the Krackhardt Kite graph. It is a dummy graph containing 10 nodes, and it is typically used to proof graph algorithms. David Krackhardt is the creator of the structure, which has the shape of a kite. It's composed of two different zones. In the first zone (composed of nodes 0 to 6), the nodes are interlinked; in the other zone (nodes 7 to 9), they are connected as a chain:

In: G = nx.krackhardt_kite_graph()
nx.draw_networkx(G)
plt.show()

In the following plot, you can examine the Krackhardt Kite's graph structure:

Let's start with connectivity. Two nodes of a graph are connected if there is at least a path (that is, a sequence of nodes) between them.

If at least one path exists, the shortest path between...

Graph loading, dumping, and sampling

Beyond NetworkX, graphs and networks can be generated and analyzed with other software. One of the best open source multiplatform software that can be used for their analysis is named Gephi. It's a visual tool and it doesn't require programming skills. It's freely available at http://gephi.github.io.

As in machine learning datasets, even graphs have standard formats for storing, loading, and exchanging. This way, you can create a graph with NetworkX, dump it to a file, and then load and analyze it with Gephi.

One of the most frequently used formats is Graph Modeling Language (GML). Now, let's see how we can dump a graph into a GML file:

In: dump_file_base = "dumped_graph"

# Be sure the dump_file file doesn't exist
def remove_file(filename):
import os
if os.path.exists(filename):
...

Summary

In this chapter, we learned what a social network is, including its creation and modification, representation, and some of the important measures of the social network and its nodes. Finally, we discussed the loading and saving of large graphs and ways to deal with the same.

With this chapter, almost all of the essential data science algorithms have been presented. Machine learning techniques were discussed in Chapter 4, Machine Learning, and social network analysis methods were discussed here. We will finally discuss the most advanced and cutting-edge techniques of deep learning and neural networks in the next chapter, Deep Learning Beyond the Basics.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Python Data Science Essentials. - Third Edition
Published in: Sep 2018Publisher: PacktISBN-13: 9781789537864
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Alberto Boschetti

Alberto Boschetti is a data scientist with expertise in signal processing and statistics. He holds a Ph.D. in telecommunication engineering and currently lives and works in London. In his work projects, he faces challenges ranging from natural language processing (NLP) and behavioral analysis to machine learning and distributed processing. He is very passionate about his job and always tries to stay updated about the latest developments in data science technologies, attending meet-ups, conferences, and other events.
Read more about Alberto Boschetti