Packt+ | Advance your knowledge in tech

You're reading from Mastering Data Mining with Python - Find patterns hidden in your data

Product typeBook

Published inAug 2016

Reading LevelIntermediate

Publisher

ISBN-139781785889950

Edition1st Edition

Languages

Python

Tools

NLTK Scikit-learn

Concepts

Data Mining

Author (1)

Megan Squire

Chapter 4. Network Analysis

Humans are very social creatures, and our ability to find connections – with each other and with other things in our lives – is one of our strongest impulses. We naturally love to connect with others, we distinguish our connections with different names or levels (friend, spouse, acquaintance, lover, enemy, BFF, frenemy, boss, employee, stranger, co-worker, neighbor), and we sometimes keep these connections for years or decades. We are fascinated by seeing people from our network appearing in other, seemingly unconnected networks. We love the notion that there might be only six (or four, or three) degrees of separation between any two people on Earth. The small world phenomenon reminds us that we are more closely connected to each other than it may appear.

So far in this book, we have experimented a lot with finding connections between things, first by finding items that are commonly associated, and then by finding entities that appear different but are really the...

What is a network?

Networks are all around us. We use the word network to refer to many different types of connected things: multiple computers hooked together, a system of cities connected by small roads and large highways, a group of people who all work in the same industry, a series of television stations that broadcast common programming. In common usage, the word network can refer to almost any set of interconnected entities.

From a data mining perspective, however, our use of the word network is more precise. We use the word network to refer to a system that can be represented by a graph made up of nodes and links. In our specialized vocabulary, nodes are the things being connected, and the links are the relationships between the nodes. The collection of all the nodes and links is called a graph. Note that we are using the word graph in its mathematical sense, not in the sense of a visualization, like a bar graph or a line graph. In graph theory vocabulary, nodes are also called vertices...

Measuring a network

Much of the analysis of a network is actually just measuring its various parts and pieces. How many nodes does it have? How are those nodes connected to each other? How many links does it have and how many ways can we traverse those edges? In this section, we will learn many of the common ways to measure a network.

Degree of a network

One way to describe a network is through its degree distribution. The degree of a node is the number of its connected edges. In an undirected graph, the degree of a node is the count of all the edges coming out of it. The degree distribution tells us how many nodes had a degree of 0, how many had a degree of 1, then 2, and so on. Figure 4 shows a histogram of the degree distribution for a simple undirected graph. Two of the nodes have a degree of three, two of the nodes have a degree of two, and one node has a degree of one:

Figure 4. Simple undirected graph and its degree distribution

Figure 5 shows some alternative shapes for degree distributions...

Representing graph data

The theoretical aspects of networks are important, but in order to be able to apply these ideas to a real-world problem, we have to first transform our data into a format that a network analysis program can understand. In this section, we will discover the common formats for representing data in a network-friendly way.

Adjacency matrix

An adjacency matrix is a convenient way to represent graph data. To construct an adjacency matrix for an undirected, unweighted graph, we can create a grid that has all the nodes listed across the top as columns, and also down the side of the grid as rows. Then we use a 1 or 0 to indicate whether there is a link between those two nodes. Consider the unweighted, undirected graph shown in Figure 12:

Figure 12. A simple unweighted, undirected graph

The adjacency matrix for the Figure 12 graph can be written like this:

  A B C D E F
A 0 1 1 1 0 0
B 1 0 1 1 0 0
C 1 1 0 0 1 1
D 1 1 0 0 0 0
E 0 0 1 0 0 0
F 0 0 1 0 0 0

A few things jump out right...

A real project

To put our new knowledge about graphs and social networks to use, we will use data about the software developers working on free, libre, and open source (FLOSS) projects developed using the Ruby programming language. As we learned in Chapter 3, Entity Matching when we tackled entity matching between projects, many Ruby programmers used the website RubyForge.org between 2003 and 2013 to create projects and collaborate. In this chapter, we are going to use this same data to learn how the social structure of this community changed over those ten years.

RubyForge developers can be placed into a social network where the developers themselves are the nodes or vertices, and the fact that they worked on a project together represents the edge or link between the nodes. We could also count how many projects they worked on together to create a weight for the link. If two developers only worked together once, the link is weaker than if the two developers worked together on dozens of projects...

Summary

In this chapter, we learned the basics of network analysis and graph theory, including how to measure a network and describe its properties. We learned why the degree, distance, and centrality of a network are important. We also investigated the various graph data formats that are used in network analysis, and considered which ones are most effective for which types of graphs. Finally, we implemented a real-world project where we build networks of software developers that had worked together in the RubyForge ecosystem. We learned various techniques for exploring the networks, including how to build smaller and more detailed networks, and how to explore component subgraphs. We discovered a few techniques for focusing on a single node and building out ego networks, and eventually we implemented those ego networks into a view of a component and its change over time.

In the next few chapters, we will turn our attention to text mining. Specifically, in Chapter 5, Sentiment Analysis in...

The rest of the chapter is locked

You have been reading a chapter from

Mastering Data Mining with Python - Find patterns hidden in your data

Published in: Aug 2016Publisher: ISBN-13: 9781785889950

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

undefined

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at $15.99/month. Cancel anytime

Author (1)

Megan Squire

Megan Squire is a professor of computing sciences at Elon University. Her primary research interest is in collecting, cleaning, and analyzing data about how free and open source software is made. She is one of the leaders of the FLOSSmole.org, FLOSSdata.org, and FLOSSpapers.org projects.
Read more about Megan Squire

Personalised recommendations for you

Based on your interests and search pattern

Et al.

Ever wonder why speech recognition systems don't understand the Scottish accent, or what would happen if an astronaut only ate mac 'n' cheese, or other spurious reflections you'd have at a bar? We did, then collated those deliberations into absurd research articles with fake figures and methodologies inspired by even more fictionally absurd studies.

BookAug 2023230 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages4

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages1

Generative AI with LangChain

This book is a comprehensive introduction to LLMs and LangChain, demystifying the basic mechanics of LangChain, its functionalities, and the myriad of applications it can be integrated into.

BookDec 2023360 pages5

Mastering Tableau 2023

This book is a comprehensive resource to mastering your Tableau skills and becoming a BI expert. As you progress, you will learn how to build advanced dashboards and improve your storytelling to derive key business insight, as well as make you well-versed with advanced functionalities of Tableau in the business intelligence domain.

BookAug 2023684 pages

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages5

Building AI Applications with ChatGPT APIs

This guide covers all ChatGPT API features for effortless creation of robust AI powered apps. With its help, you’ll be able to leverage ChatGPT’s cutting-edge NLP models to take your app development skills to the next level. You’ll also work on ten exciting projects that will give you the practical know-how that you can apply to your existing applications.

BookSep 2023258 pages2

Data Engineering with AWS

Embark on a journey to master data engineering pipelines on AWS! Our book offers a hands-on experience of AWS services for ingesting, transforming, and consuming data. Whether you're an absolute beginner or someone with basic data engineering experience, this guide is an indispensable resource.

BookOct 2023636 pages5

Modern Data Architecture on AWS

Every organization wants an agile, performant, and cost-effective data platform that meets all their current and future business needs. Purpose-built AWS analytics services and their features play a big part in building such a modern data platform. This book brings to you all the design and architectural patterns that’ll help you achieve this goal.

BookAug 2023420 pages5

Practical Guide to Applied Conformal Prediction in Python

Discover the power of Conformal Prediction with the "Practical Guide to Applied Conformal Prediction in Python." Master the latest techniques to quantify uncertainty in machine learning and computer vision models, and seamlessly apply them to your industry applications.

BookDec 2023240 pages

TinyML Cookbook

With over 70 project-based recipes, the TinyML Cookbook is a practical guide that will help you to get the most out of your microcontrollers. It provides a comprehensive understanding of the theoretical foundations while giving you hands-on experience training ML models for deployment on Arduino Nano 33 BLE Sense, Raspberry Pi Pico, and SparkFun RedBoard Artemis Nano microcontrollers.

BookNov 2023664 pages