Making a simple cURL request (Simple)

Exclusive offer: get 50% off this eBook here
Instant PHP Web Scraping [Instant]

Instant PHP Web Scraping [Instant] — Save 50%

Get up and running with the basic techniques of web scraping using PHP with this book and ebook

$12.99    $6.50
by Jacob Ward | August 2013 | Open Source

This article by Jacob Ward, the author of Instant PHP Web Scraping, briefs about the steps of making a simple cURL request which is as follows:

In PHP the most common method to retrieve a web resource, in this case a web page, is to use the cURL library, which enables our PHP script to send and receive HTTP requests to and from our target web server.

When we visit a web page in a client, such as a web browser, an HTTP request is sent. The server then responds by delivering the requested resource, for example an HTML file, to the browser, which then interprets the HTML and renders it on screen, according to any associated styling specification. When we make a cURL request, the server responds in the same way, and we receive the source code of the web page which we are then free to do with as we will in this case perform by scraping the data we require from the page.

(For more resources related to this topic, see here.)

Getting ready

In this article we will use cURL to request and download a web page from a server.

How to do it...

  1. Enter the following code into a new PHP project:

    <?php

    // Function to make GET request using cURL
    function curlGet($url) {

    $ch = curl_init(); // Initialising cURL session

    // Setting cURL options
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_URL, $url);

    $results = curl_exec($ch); // Executing cURL session

    curl_close($ch); // Closing cURL session

    return $results; // Return the results
    }

    $packtPage = curlGet('http://www.packtpub.com/oop-php-5/book');

    echo $packtPage;
    ?>

  2. Save the project as 2-curl-request.php (ensure you use the .php extension!).
  3. Execute the script.
  4. Once our script has completed, we will see the source code of http://www.packtpub.com/oop-php-5/book displayed on the screen.

How it works...

Let's look at how we performed the previously defined steps:

  1. The first line, <?php, and the last line,?>, indicate where our PHP code block will begin and end. All the PHP code should appear between these two tags.
  2. Next, we create a function called curlGet(), which accepts a single parameter $url, the URL of the resource to be requested.
  3. Running through the code inside the curlGet() function, we start off by initializing a new cURL session as follows:

    $ch = curl_init();

  4. We then set our options for cURL as follows:

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    // Tells cURL to return the results of the request (the source
    code of the target page) as a string.

    curl_setopt($ch, CURLOPT_URL, $url);
    // Here we tell cURL the URL we wish to request, notice that it is
    the $url variable that we passed into the function as a parameter.

  5. We execute our cURL request, storing the returned string in the $results variable as follows:

    $results = curl_exec($ch);

  6. Now that the cURL request has been made and we have the results, we close the cURL session by using the following code:

    curl_close($ch);

  7. At the end of the function, we return the $results variable containing our requested page, out of the function for using in our script.
    return $results;
  8. After the function is closed we are able to use it throughout the rest of our script.
  9. Later, deciding on the URL we wish to request, http://www.packtpub.com/oop-php-5/book , we execute the function, passing the URL as a parameter and storing the returned data from the function in the $packtPage variable as follows:

    $packtPage = curlGet('http://www.packtpub.com/oop-php-5/book');

  10. Finally, we echo the contents of the $packtPage variable (the page we requested) to the screen by using the following code:

    echo $packtPage;

There's more...

There are a number of different HTTP request methods which indicate the server the desired response, or the action to be performed. The request method being used in this article is cURLs default GET request. This tells the server that we would like to retrieve a resource.

Depending on the resource we are requesting, a number of parameters may be passed in the URL. For example, when we perform a search on the Packt Publishing website for a query, say, php, we notice that the URL is http://www.packtpub.com/books?keys=php. This is requesting the resource books (the page that displays search results) and passing a value of php to the keys parameter, indicating that the dynamically generated page should show results for the search query php.

More cURL Options

Of the many cURL options available, only two have been used in our preceding code. They are CURLOPT_RETURNTRANSFER and CURLOPT_URL. Though we will cover many more throughout the course of this article, some other options to be aware of, that you may wish to try out, are listed in the following table:

Option

Name

Value Purpose

CURLOPT_FAILONERROR

TRUE or FALSE

If a response code greater than 400 is returned, cURL will fail silently.

CURLOPT_FOLLOWLOCATION

TRUE or FALSE

If Location: headers are sent by the server, follow the location.

CURLOPT_USERAGENT

A user agent string, for example: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:15.0) Gecko/20100101 Firefox/15.0.1'

Sending the user agent string in your request informs the target server, which client is requesting the resource. Since many servers will only respond to 'legitimate' requests it is advisable to include one.

CURLOPT_HTTPHEADER

An array containing header information, for example: array('Cache-Control: max-age=0', 'Connection: keep-alive', 'Keep-Alive: 300', 'Accept-Language: en-us,en;q=0.5')

This option is used to send header information with  the request and we will come across use cases for this in later recipes.

A full listing of cURL options can be found on the PHP website at http://php.net/manual/en/function.curl-setopt.php.

The HTTP response code

An HTTP response code is the number that is returned, which corresponds with the result of an HTTP request. Some common response code values are as follows:

  • 200: OK
  • 301: Moved Permanently
  • 400: Bad Request
  • 401: Unauthorized
  • 403: Forbidden
  • 404: Not Found
  • 500: Internal Server Error

Summary

This article covers techniques on making a simple cURL request. It is often useful to have our scrapers responding to different response code values in a different manner, for example, letting us know if a web page has moved, or is no longer accessible, or we are unauthorized to access a particular page.

In this case, we can access the response of a request using cURL by adding the following line to our function, which will store the response code in the $httpResponse variable:

$httpResponse = curl_getinfo($ch, CURLINFO_HTTP_CODE);

Resources for Article:


Further resources on this subject:


Instant PHP Web Scraping [Instant] Get up and running with the basic techniques of web scraping using PHP with this book and ebook
Published: July 2013
eBook Price: $12.99
See more
Select your format and quantity:

About the Author :


Jacob Ward

Jacob Ward is a freelance software developer based in the UK. Through his background in research marketing and analytics he realized the importance of data and automation, which led him to his current vocation, developing enterprise-level automation tools, web bots, and screen scrapers for a wide range of international clients.

Books From Packt


Object-Oriented Programming with PHP5
Object-Oriented Programming with PHP5

PHP jQuery Cookbook
PHP jQuery Cookbook

RESTful PHP Web Services
RESTful PHP Web Services

PHP and MongoDB Web Development Beginner’s Guide
PHP and MongoDB Web Development Beginner’s Guide

JCMS Design Using PHP and jQuery
CMS Design Using PHP and jQuery

PHP Application Development with NetBeans: Beginner's Guide
PHP Application Development with NetBeans: Beginner's Guide

CakePHP Application Development
CakePHP Application Development

CouchDB and PHP Web Development Beginner's Guide
CouchDB and PHP Web Development Beginner's Guide


Your rating: None Average: 1.7 (3 votes)

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
z
Y
r
2
6
t
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software