Cooking XML with OOP

Exclusive offer: get 50% off this eBook here
Object-Oriented Programming with PHP5

Object-Oriented Programming with PHP5 — Save 50%

Learn to leverage PHP5's OOP features to write manageable applications with ease

$23.99    $12.00
by Hasin Hayder | February 2008 | MySQL Open Source PHP

XML (Extensible Markup Language) is a very important format for storing multi‑purpose data. It is also known as universal data format, as you can represent anything and visualize the data properly with the help of a renderer. One of the biggest advantages of XML is that it can be converted from one form of data into another easily with the help of XSLT. Also, XML data is highly readable.

One of the great blessings of PHP5 is its excellent support to manipulate XML. PHP5 comes bundled with new XML extensions for processing XML easily. You have a whole new SimpleXML API to read XML documents in a pure object-oriented way. Also, you have the DOMDocument object to parse and create XML documents. In this article by Hasin Hayder, we will learn these APIs and learn how to successfully process XML with PHP.

Formation of XML

Let us look at the structure of a common XML document in case you are totally new to XML. If you are already familiar with XML, which we greatly recommend for this article, then it is not a section for you.

Let's look at the following example, which represents a set of emails:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<emails>
<email>
<from>nowhere@notadomain.tld</from>
<to>unknown@unknown.tld</to>
<subject>there is no subject</subject>
<body>is it a body? oh ya</body>
</email>
</emails>

So you see that XML documents do have a small declaration at the top which details the character set of the document. This is useful if you are storing Unicode texts. In XML, you must close the tags as you start it. (XML is more strict than HTML, you must follow the conventions.)

Let's look at another example where there are some special symbols in the data:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<emails>
<email>
<from>nowhere@notadomain.tld</from>
<to>unknown@unknown.tld</to>
<subject>there is no subject</subject>
<body><![CDATA[is it a body? oh ya, with some texts
& symbols]]></body>
</email>
</emails>

This means you have to enclose all the strings containing special characters with CDATA.

Again, each entity may have some attributes with it. For example consider the following XML where we describe the properties of a student:

<student age= "17" class= "11" title= "Mr.">Ozniak</student>

In the above example, there are three attributes to this student tag—age, class, and title. Using PHP we can easily manipulate them too. In the coming sections we will learn how to parse XML documents, or how to create XML documents on the fly.

Introduction to SimpleXML

In PHP4 there were two ways to parse XML documents, and these are also available in PHP5. One is parsing documents via SAX (which is a standard) and another one is DOM. But it takes quite a long time to parse XML documents using SAX and it also needs quite a long time for you to write the code.

In PHP5 a new API has been introduced to easily parse XML documents. This was named SimpleXML API. Using SimpleXML API you can turn your XML documents into an array. Each node will be converted to an accessible form for easy parsing.

Parsing Documents

In this section we will learn how to parse basic XML documents using SimpleXML. Let's take a breath and start.

$str = <<< END
<emails>
<email>
<from>nowhere@notadomain.tld</from>
<to>unknown@unknown.tld</to>
<subject>there is no subject</subject>
<body><![CDATA[is it a body? oh ya, with some texts &
symbols]]></body>
</email>
</emails>
END;
$sxml = simplexml_load_string($str);
print_r($sxml);
?>

The output is like this:

SimpleXMLElement Object
(
[email] => SimpleXMLElement Object
(
[from] => nowhere@notadomain.tld
[to] => unknown@unknown.tld
[subject] => there is no subject
[body] => SimpleXMLElement Object
(
)

)

)

So now you can ask how to access each of these properties individually. You can access each of them like an object. For example, $sxml->email[0] returns the first email object. To access the from element under this email, you can use the following code like:

echo $sxml->email[0]->from

So, each object, unless available more than once, can be accessed just by its name. Otherwise you have to access them like a collection. For example, if you have multiple elements, you can access each of them using a foreach loop:

foreach ($sxml->email as $email)
echo $email->from;

Accessing Attributes

As we saw in the previous example, XML nodes may have attributes. Remember the example document with class, age, and title? Now you can easily access these attributes using SimpleXML API. Let's see the following example:

<?
$str = <<< END
<emails>
<email type="mime">
<from>nowhere@notadomain.tld</from>
<to>unknown@unknown.tld</to>
<subject>there is no subject</subject>
<body><![CDATA[is it a body? oh ya, with some texts &
symbols]]></body>
</email>

</emails>
END;
$sxml = simplexml_load_string($str);

foreach ($sxml->email as $email)
echo $email['type'];

?>

This will display the text mime in the output window. So if you look carefully, you will understand that each node is accessible like properties of an object, and all attributes are accessed like keys of an array. SimpleXML makes XML parsing really fun.

Parsing Flickr Feeds using SimpleXML

How about adding some milk and sugar to your coffee? So far we have learned what SimpleXML API is and how to make use of it. It would be much better if we could see a practical example. In this example we will parse the Flickr feeds and display the pictures. Sounds cool? Let's do it.

If you are interested what the Flickr public photo feed looks like, here is the content. The feed data is collected from http://www.flickr.com/services/feeds/photos_public.gne:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:dc="http://purl.org/dc/elements/1.1/" >

<title>Everyone's photos</title>
<link rel="self"
href="http://www.flickr.com/services/feeds/photos_public.gne" />
<link rel="alternate" type="text/html"
href="http://www.flickr.com/photos/"/>
<id>tag:flickr.com,2005:/photos/public</id>
<icon>http://www.flickr.com/images/buddyicon.jpg</icon>
<subtitle></subtitle>
<updated>2007-07-18T12:44:52Z</updated>
<generator uri="http://www.flickr.com/">Flickr</generator>

<entry>
<title>A-lounge 9.07_6</title>
<link rel="alternate" type="text/html"
href="http://www.flickr.com/photos/dimitranova/845455130/"/>
<id>tag:flickr.com,2005:/photo/845455130</id>
<published>2007-07-18T12:44:52Z</published>
<updated>2007-07-18T12:44:52Z</updated>
<dc:date.Taken>2007-07-09T14:22:55-08:00</dc:date.Taken>
<content type="html">&lt;p&gt;&lt;a
href=&quot;http://www.flickr.com/people/dimitranova/&quot;
&gt;Dimitranova&lt;/a&gt; posted a photo:&lt;/p&gt;

&lt;p&gt;&lt;a
href=&quot;http://www.flickr.com/photos/dimitranova/845455130/
&quot; title=&quot;A-lounge 9.07_6&quot;&gt;&lt;img src='//dgdsbygo8mp3h.cloudfront.net/sites/default/files/blank.gif' data-original=&quot;
http://farm2.static.flickr.com/1285/845455130_dce61d101f_m.jpg
&quot; width=&quot;180&quot; height=&quot;240&quot; alt=&quot;
A-lounge 9.07_6&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

</content>
<author>
<name>Dimitranova</name>
<uri>http://www.flickr.com/people/dimitranova/</uri>
</author>
<link rel="license" type="text/html" href="deed.en-us" />
<link rel="enclosure" type="image/jpeg"
href="http://farm2.static.flickr.com/1285/
845455130_7ef3a3415d_o.jpg" />

</entry>
<entry>
<title>DSC00375</title>
<link rel="alternate" type="text/html"
href="http://www.flickr.com/photos/53395103@N00/845454986/"/>
<id>tag:flickr.com,2005:/photo/845454986</id>
<published>2007-07-18T12:44:50Z</published>
...
</entry>
</feed>

Now we will extract the description from each entry and display it. Let's have some fun:

<?
$content =
file_get_contents(
"http://www.flickr.com/services/feeds/photos_public.gne ");

$sx = simplexml_load_string($content);
foreach ($sx->entry as $entry)
{
echo "<a href='{$entry->link['href']}'>".$entry->title."</a><br/>";
echo $entry->content."<br/>";
}
?>

This will create the following output. See, how easy SimpleXML is? The output of the above script is shown below:

Cooking XML with OOP

Managing CDATA Sections using SimpleXML

As we said before, some symbols can't appear directly as a value of any node unless you enclose them using CDATA tag. For example, take a look at following example:

<?
$str = <<<EOT
<data>
<content>text & images </content>

</data>

EOT;
$s = simplexml_load_string($str);
?>

This will generate the following error:

<br />
<b>Warning</b>: simplexml_load_string()
[<a href='function.simplexml-load-string'>
function.simplexml-load-string</a>]:
Entity: line 2: parser error : xmlParseEntityRef:
no name in <b>C:OOP with PHP5Codesch8cdata.php</b>
on line <b>10</b><br />
<br />
<b>Warning</b>: simplexml_load_string()
[<a href='function.simplexml-load-string'>
function.simplexml-load-string</a>]:
&lt;content&gt;text &amp; images &lt;/content&gt;
in <b>C:OOP with PHP5Codesch8cdata.php</b>
on line <b>10</b><br />
<br />
<b>Warning</b>: simplexml_load_string()
[<a href='function.simplexml-load-string'>
function.simplexml-load-string</a>]:
^ in <b>C:OOP with PHP5Codesch8cdata.php</b>
on line <b>10</b><br />

To avoid this problem we have to enclose using a CDATA tag. Let's rewrite it like this:

<data>
<content><![CDATA[text & images ]]></content>
</data>

Now it will work perfectly. And you don't have to do any extra work for managing this CDATA section.

<?
$str = <<<EOT
<data>
<content><![CDATA[text & images ]]></content>

</data>

EOT;
$s = simplexml_load_string($str);
echo $s->content;//print "text & images"
?>

However, prior to PHP5.1, you had to load this section as shown below:

$s = simplexml_load_string($str,null,LIBXML_NOCDATA);
Object-Oriented Programming with PHP5 Learn to leverage PHP5's OOP features to write manageable applications with ease
Published: December 2007
eBook Price: $23.99
Book Price: $39.99
See more
Select your format and quantity:

XPath

Another nice addition in SimpleXML is that you can query using XPath. So what is XPath? It's an expression language that helps you to locate specific nodes using formatted input. In this section we will learn how to locate a specific part of our XML documents using SimpleXML and Xpath. Let's have a look at the following XML:

<?xml version="1.0" encoding="utf-8"?>
<roles>
<task type="analysis">
<state name="new">
<assigned to="cto">
<action newstate="clarify" assignedto="pm">
<notify>pm</notify>
<notify>cto</notify>
</action>
</assigned>
</state>
<state name="clarify">
<assigned to="pm">
<action newstate="clarified" assignedto="pm">
<notify>cto</notify>
</action>
</assigned>
</state>
</task>

</roles>

This document simply states the workflow of an analysis task and then tells it what to do at which state. So now you want to search what to do when the task type is analysis and assigned to cto and current state is new. SimpleXML makes it really easy. Let's take a look at the following code:

<?
$str = <<< EOT
<roles>
<task type="analysis">
<state name="new">
<assigned to="cto">
<action newstate="clarify" assignedto="pm">
<notify>pm</notify>
<notify>cto</notify>
</action>
</assigned>
</state>
<state name="clarify">
<assigned to="pm">
<action newstate="clarified" assignedto="pm">
<notify>cto</notify>
</action>
</assigned>
</state>
</task>

</roles>
EOT;

$s = simplexml_load_string($str);
$node = $s->xpath("//task[@type='analysis']/state[@name='new']
/assigned[@to='cto']");
echo $node[0]->action[0]['newstate']."n";
echo $node[0]->action[0]->notify[0];
?>

This will echo the following:

clarify
pm

However there is something to remember while writing XPath. When your XPath is followed by / then it means that you should keep the exact sequence of your XML document. For example:

echo count($s->xpath("//state"));

This will output 2.

//state means take the state node from anywhere in the document. Now if you specify task//state, it will return all states from under all tasks. For example the following code will output 3 and 3:

echo count($s->xpath("//notify"));
echo count($s->xpath("task//notify"));

Now what if you want to find notify just under state, following assigned, following action? Your XPath query should be //state/assigned/action/notify.

But if you want that, it should be exactly under the task node which is just under the root node, it should be /task/state/assigned/action/notify.

If you need to match any attribute then match it as [@AttributeName1='value'] [@AttributeName2='value']. If you see the following XPath, it will be clear to you:

//task[@type='analysis']/state[@name='new']/assigned[@to='cto']

DOM API

SimpleXML in PHP is used to parse the document however it cannot create any XML document. For creating XML documents on the fly you have to use DOM API that comes bundled with PHP 5. Using DOM API you can also create page-scrapping tools fairly easily.

In this section we will learn how to create XML documents using DOM API, and then we will learn how to parse existing documents and modify them.

In the following example we will create just a basic HTML file:

<?
$doc = new DOMDocument("1.0","UTF-8");
$html = $doc->createElement("html");
$body = $doc->createElement("body");
$h1 = $doc->createElement("h1","OOP with PHP");
$body->appendChild($h1);
$html->appendChild($body);
$doc->appendChild($html);

echo $doc->saveHTML();
?>

This will produce the following code:

<html>
<body>
<h1>OOP with PHP</h1>
</body>
</html>

That's fairly easy, right?

Let's do some more:

<?
$doc = new DOMDocument("1.0","UTF-8");
$html = $doc->createElement("html");
$body = $doc->createElement("body");
$h1 = $doc->createElement("h1","OOP with PHP");
$h1->setAttribute("id","firsth1");
$p = $doc->createElement("p");
$p->appendChild($doc->createTextNode("Hi - how about some text?"));
$body->appendChild($h1);
$body->appendChild($p);
$html->appendChild($body);
$doc->appendChild($html);

echo $doc->saveHTML();
?>

This will produce the following code.

<html><body>
<h1 id="firsth1">OOP with PHP</h1>
<p>Hi - how about some text?</p>
</body></html>

So you can save this XML generated by the DOM engine using the following code entered into a file in your file system:

file_put_contents("c:/abc.xml", $doc->saveHTML());

Modifying Existing Documents

DOM API helps to create XML document easily as well as provide easy access to load and modify existing documents. With the following XML we will load the file we just created a few minutes ago and then we will change the header test of the first h1 object:

<?php

$uri = 'c:/abc.xml';
$document = new DOMDocument();
$document->loadHTMLFile($uri);// load the content of this URL as HTML
$h1s = $document->getElementsByTagName("h1");//find all h1 elements
$newText = $document->createElement("h1","New Heading");//created a
//new h1 element
$h1s->item(0)->parentNode->insertBefore($newText,
$h1s->item(0));//insert before the existing h1 element
$h1s->item(0)->parentNode->removeChild($h1s->item(1));//remove the
//old h1 element
echo $document->saveHTML();//display the content as HTML

?>

The output is shown below:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" 
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<h1>New Heading</h1>
<p>Hi - how about some text?</p>
</body></html>

Other Useful Functions

There are some other useful functions in the DOM library. We are not going to discuss them in depth, however they are included in this section for a one line overview.

  • DomNode->setAttribute(): Helps to set the attribute of any node
  • DomNode->hasChildNodes(): Returns true if a DOM node has a child node
  • DomNode->replaceChild(): Replaces any child node with another one
  • DomNode->cloneNode(): Creates a deep copy of the current code

Summary

XML API in PHP5 plays a very important role in web application development, most notably the new SimpleXML API, which simplifies parsing with ease. Today XML is one of the most used data formats for almost all big applications. Therefore getting familiar with XML APIs and relevant technologies will definitely help you to design robust XML‑based applications more easily.

Object-Oriented Programming with PHP5 Learn to leverage PHP5's OOP features to write manageable applications with ease
Published: December 2007
eBook Price: $23.99
Book Price: $39.99
See more
Select your format and quantity:

About the Author :


Hasin Hayder

Hasin Hayder graduated in Civil Engineering from the Rajshahi University of Engineering and Technology (RUET) in Bangladesh. He is a Zend-certified Engineer and expert in developing localized applications. He is currently working as a Technical Director in Trippert Labs and managing the local branch in Bangladesh. Beside his full time job, Hasin writes his blog at http://hasin.wordpress.com, writes article in different websites and maintains his open source framework Orchid at http://orchid.phpxperts.net. Hasin lives in Bangladesh with his wife Ayesha and his son, Afif.

Books From Packt


Scalix: Linux Administrator’s Guide


Web 2.0 Website Programming with Django


AsteriskNOW


Programming Microsoft Dynamics NAV


SOA and WS-BPEL


Java EE 5 Development using GlassFish Application Server


Mastering OpenLDAP


OSWorkflow

 

 

No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
u
a
7
H
d
b
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software