Reader small image

You're reading from  Python for Secret Agents - Volume II - Second Edition

Product typeBook
Published inDec 2015
Reading LevelIntermediate
Publisher
ISBN-139781785283406
Edition2nd Edition
Languages
Right arrow
Authors (2):
Steven F. Lott
Steven F. Lott
author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott

Steven F. Lott
Steven F. Lott
author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott

View More author details
Right arrow

Chapter 3. Following the Social Network

Intelligence gathering is really networking. It's networking with an avowed purpose of learning something new. It's an essentially social game. Most agents have connections; the more successful agents seem to have the most connections. When you read historical accounts of the British MI5/SIS agent code-named Garbo, you'll see how a vast and sophisticated social network is essential to espionage.

We'll leverage Twitter to gather pictures and text. We'll explore the Twitter Application Program Interface (API) to see what people are doing. The Twitter API uses Representational State Transfer (REST) as its protocol. We'll use Python's http.client to connect with RESTful web services like Twitter.

We can use the Twitter APIs to discover the extent of a social network. We'll try to discern the interactions one person has. We can use this to find the active connections among people. It requires some statistical care, but we can make steps toward discerning...

Background briefing – images and social media


We'll use the Pillow implementation of the PIL package to extract and convert any graphics or images. In Chapter 1, New Missions, New Tools, we used the pip3.4 program to install Pillow 2.9.0. The PIL package has modules that allow us to convert images to a common format. It also allows us to create thumbnails of images. This can help us build a tidy summary of images we collected.

Most importantly, it allows us to validate an image file. It turns out that the compression algorithms used on some images can be hacked. Someone can tweak the bytes of an image so that it appears to be infinitely large. This will cause the computer opening the image to get progressively slower until the image processing application finally crashes. A basic counter-intelligence ploy is to circulate damaged image files that leave agents struggling to figure out what went wrong.

The PIL module is an important piece of counter-counter-intelligence. We don't want to accept...

Who's doing the talking?


We'll use the TwitterAPI module to gather information about people by using the Twitter social network. This is not necessarily the "best" social network. It's widely-used and has a good API. Other social networking software is also important, and worthy of study. We have to begin somewhere, and Twitter seems to have a fairly low barrier to entry.

In Chapter 1, New Missions, New Tools, we downloaded the Twitter API. For information on how to use this package, visit http://pythonhosted.org/TwitterAPI/.

The first step to using Twitter is to have a Twitter account. This is easy and free. Agents who don't have a Twitter account can sign up at http://www.twitter.com. Once signed up, agents might want to follow the Twitter feed of PacktPub (https://twitter.com/PacktPub) to see how Twitter works.

An agent will need to provide a mobile phone number to Twitter to create applications. The information is available here: https://support.twitter.com/articles/110250-adding-your-mobile...

What do they seem to be talking about?


Finding the social network is only the first step. We want to examine the conversation, also. We'll look at two aspects of this conversion: words and pictures. Our first background mission in this section was to be sure we had Pillow working properly. This will also help us download pictures.

Words are somewhat simpler. Interestingly, the tweet content isn't obvious in the Twitter API definitions. It turns out that "status" is what we're looking for. The resource called statuses/user_timeline has the tweets made by a given user.

Each status or tweet is packaged with a collection of entities. These are the URL references, media attachments, @ user_mentions, # hashtags, and $ symbols. The entities are separated from the body of the tweet, which greatly simplifies our analysis.

Here's a function to get the last 20 tweets from a user:

def tweets_by_screen_name(screen_name):
    api = TwitterAPI(consumer_key,
                     consumer_secret,
          ...

What are they posting?


To gather images being posted, we'll modify our query that retrieves tweets. We'll get the media URL from the tweet, use urllib.request to get the image file, and use Pillow to confirm that it's a valid image and create a thumbnail of the image. While there are a lot of steps, each of them is something we've already seen.

We'll break this function into two parts: the Twitter part and the image processing part. Here's the first part, making the essential Twitter request:

import urllib.request
import urllib.parse
from PIL import Image
import io
def tweet_images_by_screen_name(screen_name):
    api = TwitterAPI(consumer_key,
                     consumer_secret,
                     auth_type='oAuth2')
    response= api.request( 'statuses/user_timeline',
                              {'screen_name':screen_name, 'count':30} )
    for item in response.json():
        text= item['text']
        entities= item['entities']
        if 'media' in entities:
            media_list...

Deep Under Cover – NLTK and language analysis


As we study Twitter more and more, we see that they've made an effort to expose numerous details of the social network. They've parsed the Tweet to extract hashtags and user mentions, they've carefully organized the media. This makes a great deal of analysis quite simple.

On the other hand, some parts of the analysis are still quite difficult. The actual topic of a Twitter conversion is just a string of characters. It's essentially opaque until a person reads the characters to understand the words and the meaning behind the words.

Understanding natural-language text is a difficult problem. We often assign it to human analysts. If we can dump the related tweets into a single easy-to-read document, then a person can scan it, summarize, and decide if this is actionable intelligence or just background noise.

One of the truly great counter-intelligence missions is Operation Mincemeat. There are many books that describe this operation. What's import about...

Summary


We discussed the basics of automated analysis of the social network. We looked at one particular social network: the people who use Twitter to exchange messages. This is about 316 million active users, exchanging about 500 million messages a month. We saw how to find information about specific people, about the list of friends a person follows, and the tweets a person makes.

We also discussed how to download additional media from social networking sites. We used PIL to confirm that an image is saved to work with. We also used PIL to create thumbnails of images. We can do a great deal of processing to gather and analyze data that people readily publish about themselves.

In the next chapter, we'll look at another source of data that's often difficult to work with. The ubiquitous PDF file format is difficult to process without specialized tools. The file is designed to allow consistent display and printing of documents. It's not, however, too helpful for analysis of content. We'll need...

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Python for Secret Agents - Volume II - Second Edition
Published in: Dec 2015Publisher: ISBN-13: 9781785283406
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at €14.99/month. Cancel anytime

Authors (2)

author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott

author image
Steven F. Lott

Steven Lott has been programming since computers were large, expensive, and rare. Working for decades in high tech has given him exposure to a lot of ideas and techniques, some bad, but most are helpful to others. Since the 1990s, Steven has been engaged with Python, crafting an array of indispensable tools and applications. His profound expertise has led him to contribute significantly to Packt Publishing, penning notable titles like "Mastering Object-Oriented," "The Modern Python Cookbook," and "Functional Python Programming." A self-proclaimed technomad, Steven's unconventional lifestyle sees him residing on a boat, often anchored along the vibrant east coast of the US. He tries to live by the words “Don't come home until you have a story.”
Read more about Steven F. Lott