Python Web Penetration Testing Cookbook

4 (1 reviews total)
By Cameron Buchanan , Terry Ip , Andrew Mabbitt and 2 more
  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. Gathering Open Source Intelligence

About this book

This book gives you an arsenal of Python scripts perfect to use or to customize your needs for each stage of the testing process. Each chapter takes you step by step through the methods of designing and modifying scripts to attack web apps. You will learn how to collect both open and hidden information from websites to further your attacks, identify vulnerabilities, perform SQL Injections, exploit cookies, and enumerate poorly configured systems. You will also discover how to crack encryption, create payloads to mimic malware, and create tools to output your findings into presentable formats for reporting to your employers.

Publication date:
June 2015
Publisher
Packt
Pages
224
ISBN
9781784392932

 

Chapter 1. Gathering Open Source Intelligence

In this chapter, we will cover the following topics:

  • Gathering information using the Shodan API

  • Scripting a Google+ API search

  • Downloading profile pictures using the Google+ API

  • Harvesting additional results using the Google+ API pagination

  • Getting screenshots of websites using QtWebKit

  • Screenshots based on port lists

  • Spidering websites

 

Introduction


Open Source Intelligence (OSINT) is the process of gathering information from Open (overt) sources. When it comes to testing a web application, that might seem a strange thing to do. However, a great deal of information can be learned about a particular website before even touching it. You might be able to find out what server-side language the website is written in, the underpinning framework, or even its credentials. Learning to use APIs and scripting these tasks can make the bulk of the gathering phase a lot easier.

In this chapter, we will look at a few of the ways we can use Python to leverage the power of APIs to gain insight into our target.

 

Gathering information using the Shodan API


Shodan is essentially a vulnerability search engine. By providing it with a name, an IP address, or even a port, it returns all the systems in its databases that match. This makes it one of the most effective sources for intelligence when it comes to infrastructure. It's like Google for internet-connected devices. Shodan constantly scans the Internet and saves the results into a public database. Whilst this database is searchable from the Shodan website (https://www.shodan.io), the results and services reported on are limited, unless you access it through the Application Programming Interface (API).

Our task for this section will be to gain information about the Packt Publishing website by using the Shodan API.

Getting ready

At the time of writing this, Shodan membership is $49, and this is needed to get an API key. If you're serious about security, access to Shodan is invaluable.

If you don't already have an API key for Shodan, visit www.shodan.io/store/member and sign up for it. Shodan has a really nice Python library, which is also well documented at https://shodan.readthedocs.org/en/latest/.

To get your Python environment set up to work with Shodan, all you need to do is simply install the library using cheeseshop:

$ easy_install shodan

How to do it…

Here's the script that we are going to use for this task:

import shodan
import requests

SHODAN_API_KEY = "{Insert your Shodan API key}" 
api = shodan.Shodan(SHODAN_API_KEY)

target = 'www.packtpub.com'

dnsResolve = 'https://api.shodan.io/dns/resolve?hostnames=' + target + '&key=' + SHODAN_API_KEY

try:
    # First we need to resolve our targets domain to an IP
    resolved = requests.get(dnsResolve)
    hostIP = resolved.json()[target]

    # Then we need to do a Shodan search on that IP
    host = api.host(hostIP)
    print "IP: %s" % host['ip_str']
    print "Organization: %s" % host.get('org', 'n/a')
    print "Operating System: %s" % host.get('os', 'n/a')

    # Print all banners
    for item in host['data']:
        print "Port: %s" % item['port']
        print "Banner: %s" % item['data']

    # Print vuln information
    for item in host['vulns']:
        CVE = item.replace('!','')
        print 'Vulns: %s' % item
        exploits = api.exploits.search(CVE)
        for item in exploits['matches']:
            if item.get('cve')[0] == CVE:
                print item.get('description')
except:
    'An error occured'

The preceding script should produce an output similar to the following:

IP: 83.166.169.231
Organization: Node4 Limited
Operating System: None

Port: 443
Banner: HTTP/1.0 200 OK

Server: nginx/1.4.5

Date: Thu, 05 Feb 2015 15:29:35 GMT

Content-Type: text/html; charset=utf-8

Transfer-Encoding: chunked

Connection: keep-alive

Expires: Sun, 19 Nov 1978 05:00:00 GMT

Cache-Control: public, s-maxage=172800

Age: 1765

Via: 1.1 varnish

X-Country-Code: US


Port: 80
Banner: HTTP/1.0 301 https://www.packtpub.com/

Location: https://www.packtpub.com/

Accept-Ranges: bytes

Date: Fri, 09 Jan 2015 12:08:05 GMT

Age: 0

Via: 1.1 varnish

Connection: close

X-Country-Code: US

Server: packt


Vulns: !CVE-2014-0160
The (1) TLS and (2) DTLS implementations in OpenSSL 1.0.1 before 1.0.1g do not properly handle Heartbeat Extension packets, which allows remote attackers to obtain sensitive information from process memory via crafted packets that trigger a buffer over-read, as demonstrated by reading private keys, related to d1_both.c and t1_lib.c, aka the Heartbleed bug.

I've just chosen a few of the available data items that Shodan returns, but you can see that we get a fair bit of information back. In this particular instance, we can see that there is a potential vulnerability identified. We also see that this server is listening on ports 80 and 443 and that according to the banner information, it appears to be running nginx as the HTTP server.

How it works…

  1. Firstly, we set up our static strings within the code; this includes our API key:

    SHODAN_API_KEY = "{Insert your Shodan API key}" 
    target = 'www.packtpub.com'
    
    dnsResolve = 'https://api.shodan.io/dns/resolve?hostnames=' + target + '&key=' + SHODAN_API_KEY
  2. The next step is to create our API object:

    api = shodan.Shodan(SHODAN_API_KEY)
  3. In order to search for information on a host using the API, we need to know the host's IP address. Shodan has a DNS resolver but it's not included in the Python library. To use Shodan's DNS resolver, we simply have to make a GET request to the Shodan DNS Resolver URL and pass it the domain (or domains) we are interested in:

    resolved = requests.get(dnsResolve)
    hostIP = resolved.json()[target] 
  4. The returned JSON data will be a dictionary of domains to IP addresses; as we only have one target in our case, we can simply pull out the IP address of our host using the target string as the key for the dictionary. If you were searching on multiple domains, you would probably want to iterate over this list to obtain all the IP addresses.

  5. Now, we have the host's IP address, we can use the Shodan libraries host function to obtain information on our host. The returned JSON data contains a wealth of information about the host, though in our case we will just pull out the IP address, organization, and if possible the operating system that is running. Then we will loop over all of the ports that were found to be open and their respective banners:

        host = api.host(hostIP)
        print "IP: %s" % host['ip_str']
        print "Organization: %s" % host.get('org', 'n/a')
        print "Operating System: %s" % host.get('os', 'n/a')
    
        # Print all banners
        for item in host['data']:
            print "Port: %s" % item['port']
            print "Banner: %s" % item['data']
  6. The returned data may also contain potential Common Vulnerabilities and Exposures (CVE) numbers for vulnerabilities that Shodan thinks the server may be susceptible to. This could be really beneficial to us, so we will iterate over the list of these (if there are any) and use another function from the Shodan library to get information on the exploit:

    for item in host['vulns']:
            CVE = item.replace('!','')
            print 'Vulns: %s' % item
            exploits = api.exploits.search(CVE)
            for item in exploits['matches']:
                if item.get('cve')[0] == CVE:
                    print item.get('description')

    That's it for our script. Try running it against your own server.

There's more…

We've only really scratched the surface of the Shodan Python library with our script. It is well worth reading through the Shodan API reference documentation and playing around with the other search options. You can filter results based on "facets" to narrow down your searches. You can even use searches that other users have saved using the "tags" search.

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

 

Scripting a Google+ API search


Social media is a great way to gather information on a target company or person. Here, we will be showing you how to script a Google+ API search to find contact information for a company within the Google+ social sites.

Getting ready

Some Google APIs require authorization to access them, but if you have a Google account, getting the API key is easy. Just go to https://console.developers.google.com and create a new project. Click on API & auth | Credentials. Click on Create new key and Server key. Optionally enter your IP or just click on Create. Your API key will be displayed and ready to copy and paste into the following recipe.

How to do it…

Here's a simple script to query the Google+ API:

import urllib2

GOOGLE_API_KEY = "{Insert your Google API key}" 
target = "packtpub.com"
api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY).read()
api_response = api_response.split("\n")
for line in api_response:
    if "displayName" in line:
        print line

How it works…

The preceding code makes a request to the Google+ search API (authenticated with your API key) and searches for accounts matching the target; packtpub.com. Similarly to the preceding Shodan script, we set up our static strings including the API key and target:

GOOGLE_API_KEY = "{Insert your Google API key}" 
target = "packtpub.com"

The next step does two things: first, it sends the HTTP GET request to the API server, then it reads in the response and stores the output into an api_response variable:

api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY).read()

This request returns a JSON formatted response; an example snippet of the results is shown here:

In our script, we convert the response into a list so it's easier to parse:

api_response = api_response.split("\n")

The final part of the code loops through the list and prints only the lines that contain displayName, as shown here:

See also…

In the next recipe, Downloading profile pictures using the Google+ API, we will look at improving the formatting of these results.

There's more…

By starting with a simple script to query the Google+ API, we can extend it to be more efficient and make use of more of the data returned. Another key aspect of the Google+ platform is that users may also have a matching account on another of Google's services, which means you can cross-reference accounts. Most Google products have an API available to developers, so a good place to start is https://developers.google.com/products/. Grab an API key and plug the output from the previous script into it.

 

Downloading profile pictures using the Google+ API


Now that we have established how to use the Google+ API, we can design a script to pull down pictures. The aim here is to put faces to names taken from web pages. We will send a request to the API through a URL, handle the response through JSON, and create picture files in the working directory of the script.

How to do it

Here's a simple script to download profile pictures using the Google+ API:

import urllib2
import json

GOOGLE_API_KEY = "{Insert your Google API key}"
target = "packtpub.com"
api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY).read()

json_response = json.loads(api_response)
for result in json_response['items']:
      name = result['displayName']
      print name
      image = result['image']['url'].split('?')[0]
  f = open(name+'.jpg','wb+')
  f.write(urllib2.urlopen(image).read())
  f.close()

How it works

The first change is to store the display name into a variable, as this is then reused later on:

      name = result['displayName']
      print name

Next, we grab the image URL from the JSON response:

image = result['image']['url'].split('?')[0]

The final part of the code does a number of things in three simple lines: firstly it opens a file on the local disk, with the filename set to the name variable. The wb+ flag here indicates to the OS that it should create the file if it doesn't exist and to write the data in a raw binary format. The second line makes a HTTP GET request to the image URL (stored in the image variable) and writes the response into the file. Finally, the file is closed to free system memory used to store the file contents:

  f = open(name+'.jpg','wb+')
  f.write(urllib2.urlopen(image).read())
  f.close()

After the script is run, the console output will be the same as before, with the display names shown. However, your local directory will now also contain all the profile images, saved as JPEG files.

 

Harvesting additional results from the Google+ API using pagination


By default, the Google+ APIs return a maximum of 25 results, but we can extend the previous scripts by increasing the maximum value and harvesting more results through pagination. As before, we will communicate with the Google+ API through a URL and the urllib library. We will create arbitrary numbers that will increase as requests go ahead, so we can move across pages and gather more results.

How to do it

The following script shows how you can harvest additional results from the Google+ API:

import urllib2
import json

GOOGLE_API_KEY = "{Insert your Google API key}"
target = "packtpub.com"
token = ""
loops = 0

while loops < 10:
  api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()

  json_response = json.loads(api_response)
  token = json_response['nextPageToken']

  if len(json_response['items']) == 0:
    break

  for result in json_response['items']:
        name = result['displayName']
        print name
        image = result['image']['url'].split('?')[0]
    f = open(name+'.jpg','wb+')
    f.write(urllib2.urlopen(image).read())
  loops+=1

How it works

The first big change in this script that is the main code has been moved into a while loop:

token = ""
loops = 0

while loops < 10:

Here, the number of loops is set to a maximum of 10 to avoid sending too many requests to the API servers. This value can of course be changed to any positive integer. The next change is to the request URL itself; it now contains two additional trailing parameters maxResults and pageToken. Each response from the Google+ API contains a pageToken value, which is a pointer to the next set of results. Note that if there are no more results, a pageToken value is still returned. The maxResults parameter is self-explanatory, but can only be increased to a maximum of 50:

  api_response = urllib2.urlopen("https://www.googleapis.com/plus/v1/people? query="+target+"&key="+GOOGLE_API_KEY+"&maxResults=50& pageToken="+token).read()

The next part reads the same as before in the JSON response, but this time it also extracts the nextPageToken value:

  json_response = json.loads(api_response)
  token = json_response['nextPageToken']

The main while loop can stop if the loops variable increases up to 10, but sometimes you may only get one page of results. The next part in the code checks to see how many results were returned; if there were none, it exits the loop prematurely:

  if len(json_response['items']) == 0:
    break

Finally, we ensure that we increase the value of the loops integer each time. A common coding mistake is to leave this out, meaning the loop will continue forever:

  loops+=1
 

Getting screenshots of websites with QtWebKit


They say a picture is worth a thousand words. Sometimes, it's good to get screenshots of websites during the intelligence gathering phase. We may want to scan an IP range and get an idea of which IPs are serving up web pages, and more importantly what they look like. This could assist us in picking out interesting sites to focus on and we also might want to quickly scan ports on a particular IP address for the same reason. We will take a look at how we can accomplish this using the QtWebKit Python library.

Getting ready

The QtWebKit is a bit of a pain to install. The easiest way is to get the binaries from http://www.riverbankcomputing.com/software/pyqt/download. For Windows users, make sure you pick the binaries that fit your python/arch path. For example, I will use the PyQt4-4.11.3-gpl-Py2.7-Qt4.8.6-x32.exe binary to install Qt4 on my Windows 32bit Virtual Machine that has Python version 2.7 installed. If you are planning on compiling Qt4 from the source files, make sure you have already installed SIP.

How to do it…

Once you've got PyQt4 installed, you're pretty much ready to go. The following script is what we will use as the base for our screenshot class:

import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

class Screenshot(QWebView):
    def __init__(self):
        self.app = QApplication(sys.argv)
        QWebView.__init__(self)
        self._loaded = False
        self.loadFinished.connect(self._loadFinished)

    def wait_load(self, delay=0):
        while not self._loaded:
            self.app.processEvents()
            time.sleep(delay)
        self._loaded = False

    def _loadFinished(self, result):
        self._loaded = True

    def get_image(self, url):
        self.load(QUrl(url))
        self.wait_load()

        frame = self.page().mainFrame()
        self.page().setViewportSize(frame.contentsSize())

        image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
        painter = QPainter(image)
        frame.render(painter)
        painter.end()
        return image

Create the preceding script and save it in the Python Lib folder. We can then reference it as an import in our scripts.

How it works…

The script makes use of QWebView to load the URL and then creates an image using QPainter. The get_image function takes a single parameter: our target. Knowing this, we can simply import it into another script and expand the functionality.

Let's break down the script and see how it works.

Firstly, we set up our imports:

import sys
import time
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

Then, we create our class definition; the class we are creating extends from QWebView by inheritance:

class Screenshot(QWebView):

Next, we create our initialization method:

def __init__(self):
        self.app = QApplication(sys.argv)
        QWebView.__init__(self)
        self._loaded = False
        self.loadFinished.connect(self._loadFinished)

def wait_load(self, delay=0):
        while not self._loaded:
            self.app.processEvents()
            time.sleep(delay)
        self._loaded = False

def _loadFinished(self, result):
        self._loaded = True

The initialization method sets the self.__loaded property. This is used along with the __loadFinished and wait_load functions to check the state of the application as it runs. It waits until the site has loaded before taking a screenshot. The actual screenshot code is contained in the get_image function:

def get_image(self, url):
        self.load(QUrl(url))
        self.wait_load()

        frame = self.page().mainFrame()
        self.page().setViewportSize(frame.contentsSize())

        image = QImage(self.page().viewportSize(), QImage.Format_ARGB32)
        painter = QPainter(image)
        frame.render(painter)
        painter.end()
        return image

Within this get_image function, we set the size of the viewport to the size of the contents within the main frame. We then set the image format, assign the image to a painter object, and then render the frame using the painter. Finally, we return the processed image.

There's more…

To use the class we've just made, we just import it into another script. For example, if we wanted to just save the image we get back, we could do something like the following:

import screenshot
s = screenshot.Screenshot()
image = s.get_image('http://www.packtpub.com')
image.save('website.png')

That's all there is to it. In the next script, we will create something a little more useful.

 

Screenshots based on a port list


In the previous script, we created our base function to return an image for a URL. We will now expand on that to loop over a list of ports that are commonly associated with web-based administration portals. This will allow us to point the script at an IP and automatically run through the possible ports that could be associated with a web server. This is to be used in cases when we don't know which ports are open on a server, rather than when where we are specifying the port and domain.

Getting ready

In order for this script to work, we'll need to have the script created in the Getting screenshots of a website with QtWeb Kit recipe. This should be saved in the Pythonxx/Lib folder and named something clear and memorable. Here, we've named that script screenshot.py. The naming of your script is particularly essential as we reference it with an important declaration.

How to do it…

This is the script that we will be using:

import screenshot
import requests

portList = [80,443,2082,2083,2086,2087,2095,2096,8080,8880,8443,9998,4643, 9001,4489]

IP = '127.0.0.1'

http = 'http://'
https = 'https://'

def testAndSave(protocol, portNumber):
    url = protocol + IP + ':' + str(portNumber)
    try:
        r = requests.get(url,timeout=1)

        if r.status_code == 200:
            print 'Found site on ' + url 
            s = screenshot.Screenshot()
            image = s.get_image(url)
            image.save(str(portNumber) + '.png')
    except:
        pass

for port in portList:
    testAndSave(http, port)
    testAndSave(https, port)

How it works…

We first create our import declarations. In this script, we use the screenshot script we created before and also the requests library. The requests library is used so that we can check the status of a request before trying to convert it to an image. We don't want to waste time trying to convert sites that don't exist.

Next, we import our libraries:

import screenshot
import requests

The next step sets up the array of common port numbers that we will be iterating over. We also set up a string with the IP address we will be using:

portList = [80,443,2082,2083,2086,2087,2095,2096,8080,8880,8443,9998,4643, 9001,4489]

IP = '127.0.0.1'

Next, we create strings to hold the protocol part of the URL that we will be building later; this just makes the code later on a little bit neater:

http = 'http://'
https = 'https://'

Next, we create our method, which will do the work of building the URL string. After we've created the URL, we check whether we get a 200 response code back for our get request. If the request is successful, we convert the web page returned to an image and save it with the filename being the successful port number. The code is wrapped in a try block because if the site doesn't exist when we make the request, it will throw an error:

def testAndSave(protocol, portNumber):
    url = protocol + IP + ':' + str(portNumber)
    try:
        r = requests.get(url,timeout=1)

        if r.status_code == 200:
            print 'Found site on ' + url 
            s = screenshot.Screenshot()
            image = s.get_image(url)
            image.save(str(portNumber) + '.png')
    except:
        pass

Now that our method is ready, we simply iterate over each port in the port list and call our method. We do this once for the HTTP protocol and then with HTTPS:

for port in portList:
    testAndSave(http, port)
    testAndSave(https, port)

And that's it. Simply run the script and it will save the images to the same location as the script.

There's more…

You might notice that the script takes a while to run. This is because it has to check each port in turn. In practice, you would probably want to make this a multithreaded script so that it can check multiple URLs at the same time. Let's take a quick look at how we can modify the code to achieve this.

First, we'll need a couple more import declarations:

import Queue
import threading

Next, we need to create a new function that we will call threader. This new function will handle putting our testAndSave functions into the queue:

def threader(q, port):
    q.put(testAndSave(http, port))
    q.put(testAndSave(https, port))

Now that we have our new function, we just need to set up a new Queue object and make a few threading calls. We will take out the testAndSave calls from our FOR loop over the portList variable and replace it with this code:

q = Queue.Queue()

for port in portList:
    t = threading.Thread(target=threader, args=(q, port))
    t.deamon = True
    t.start()

s = q.get()

So, our new script in total now looks like this:

import Queue
import threading
import screenshot
import requests

portList = [80,443,2082,2083,2086,2087,2095,2096,8080,8880,8443,9998,4643, 9001,4489]

IP = '127.0.0.1'

http = 'http://'
https = 'https://'

def testAndSave(protocol, portNumber):
    url = protocol + IP + ':' + str(portNumber)
    try:
        r = requests.get(url,timeout=1)

        if r.status_code == 200:
            print 'Found site on ' + url 
            s = screenshot.Screenshot()
            image = s.get_image(url)
            image.save(str(portNumber) + '.png')
    except:
        pass

def threader(q, port):
    q.put(testAndSave(http, port))
    q.put(testAndSave(https, port))

q = Queue.Queue()

for port in portList:
    t = threading.Thread(target=threader, args=(q, port))
    t.deamon = True
    t.start()

s = q.get()

If we run this now, we will get a much quicker execution of our code as the web requests are now being executed in parallel with each other.

You could try to further expand the script to work on a range of IP addresses too; this can be handy when you're testing an internal network range.

 

Spidering websites


Many tools provide the ability to map out websites, but often you are limited to style of output or the location in which the results are provided. This base plate for a spidering script allows you to map out websites in short order with the ability to alter them as you please.

Getting ready

In order for this script to work, you'll need the BeautifulSoup library, which is installable from the apt command with apt-get install python-bs4 or alternatively pip install beautifulsoup4. It's as easy as that.

How to do it…

This is the script that we will be using:

import urllib2 
from bs4 import BeautifulSoup
import sys
urls = []
urls2 = []

tarurl = sys.argv[1] 

url = urllib2.urlopen(tarurl).read()
soup = BeautifulSoup(url)
for line in soup.find_all('a'):
    newline = line.get('href')
    try: 
        if newline[:4] == "http": 
            if tarurl in newline: 
            urls.append(str(newline)) 
        elif newline[:1] == "/": 
            combline = tarurl+newline urls.append(str(combline)) except: 
               pass

    for uurl in urls: 
        url = urllib2.urlopen(uurl).read() 
        soup = BeautifulSoup(url) 
        for line in soup.find_all('a'): 
            newline = line.get('href') 
            try: 
                if newline[:4] == "http": 
                    if tarurl in newline:
                        urls2.append(str(newline)) 
                elif newline[:1] == "/": 
                    combline = tarurl+newline 
                    urls2.append(str(combline)) 
                    except: 
                pass 
            urls3 = set(urls2) 
    for value in urls3: 
    print value

How it works…

We first import the necessary libraries and create two empty lists called urls and urls2. These will allow us to run through the spidering process twice. Next, we set up input to be added as an addendum to the script to be run from the command line. It will be run like:

$ python spider.py http://www.packtpub.com

We then open the provided url variable and pass it to the beautifulsoup tool:

url = urllib2.urlopen(tarurl).read() 
soup = BeautifulSoup(url) 

The beautifulsoup tool splits the content into parts and allows us to only pull the parts that we want to:

for line in soup.find_all('a'): 
newline = line.get('href') 

We then pull all of the content that is marked as a tag in HTML and grab the element within the tag specified as href. This allows us to grab all the URLs listed in the page.

The next section handles relative and absolute links. If a link is relative, it starts with a slash to indicate that it is a page hosted locally to the web server. If a link is absolute, it contains the full address including the domain. What we do with the following code is ensure that we can, as external users, open all the links we find and list them as absolute links:

if newline[:4] == "http": 
if tarurl in newline: 
urls.append(str(newline)) 
  elif newline[:1] == "/": 
combline = tarurl+newline urls.append(str(combline))

We then repeat the process once more with the urls list that we identified from that page by iterating through each element in the original url list:

for uurl in urls:

Other than a change in the referenced lists and variables, the code remains the same.

We combine the two lists and finally, for ease of output, we take the full list of the urls list and turn it into a set. This removes duplicates from the list and allows us to output it neatly. We iterate through the values in the set and output them one by one.

There's more…

This tool can be tied in with any of the functionality shown earlier and later in this book. It can be tied to Getting Screenshots of a website with QtWeb Kit to allow you to take screenshots of every page. You can tie it to the email address finder in the Chapter 2, Enumeration, to gain email addresses from every page, or you can find another use for this simple technique to map web pages.

The script can be easily changed to add in levels of depth to go from the current level of 2 links deep to any value set by system argument. The output can be changed to add in URLs present on each page, or to turn it into a CSV to allow you to map vulnerabilities to pages for easy notation.

About the Authors

  • Cameron Buchanan

    Cameron Buchanan is a penetration tester by trade and a writer in his spare time. He has performed penetration tests around the world for a variety of clients across many industries. Previously, Cameron was a member of the RAF. In his spare time, he enjoys doing stupid things, such as trying to make things fly, getting electrocuted, and dunking himself in freezing cold water. He is married and lives in London.

    Browse publications by this author
  • Terry Ip

    Terry Ip is a security consultant. After nearly a decade of learning how to support IT infrastructure, he decided that it would be much more fun learning how to break it instead. He is married and lives in Buckinghamshire, where he tends to his chickens.

    Browse publications by this author
  • Andrew Mabbitt

    Andrew Mabbitt is a penetration tester living in London, UK. He spends his time beating down networks, mentoring, and helping newbies break into the industry. In his free time, he loves to travel, break things, and master the art of sarcasm.

    Browse publications by this author
  • Benjamin May

    Benjamin May is a security test engineer from Cambridge. He studied computing for business at Aston University. With a background in software testing, he recently combined this with his passion for security to create a new role in his current company. He has a broad interest in security across all aspects of the technology field, from reverse engineering embedded devices to hacking with Python and participating in CTFs. He is a husband and a father.

    Browse publications by this author
  • Dave Mound

    Dave Mound is a security consultant. He is a Microsoft Certified Application Developer but spends more time developing Python programs these days. He has been studying information security since 1994 and holds the following qualifications: C|EH, SSCP, and MCAD. He recently studied for OSCP certification but is still to appear for the exam. He enjoys talking and presenting and is keen to pass on his skills to other members of the cyber security community.

    When not attached to a keyboard, he can be found tinkering with his 1978 Chevrolet Camaro. He once wrestled a bear and was declared the winner by omoplata.

    Browse publications by this author

Latest Reviews

(1 reviews total)
Good

Recommended For You

Hands-On Application Penetration Testing with Burp Suite

Test, fuzz, and break web applications and services using Burp Suite’s powerful capabilities

By Carlos A. Lozano and 2 more
Burp Suite Cookbook

Get hands-on experience in using Burp Suite to execute attacks and perform web assessments

By Sunny Wear