Accessing and using the RDF data in Stanbol

Exclusive offer: get 50% off this eBook here
Instant Apache Stanbol [Instant]

Instant Apache Stanbol [Instant] — Save 50%

Learn how to deploy Stanbol to extend content management with semantic serviceswith this book and ebook

$14.99    $7.50
by Reto Bachmann-Gmür | July 2013 | Open Source

Stanbol describes the annotations on the data it extracts using the Resource Description Format (RDF). RDF is a standard that provides the highly generic and flexible data model, which is the foundation for the Semantic Web.

In this article by Reto Bachmann-Gmür, the author of Instant Apache Stanbol [Instant], will read the data returned by Stanbol as RDF and store and query this data.

(For more resources related to this topic, see here.)

Getting ready

To start with, we need a Stanbol instance and Node.js. Additionally, we need the file rdfstore-js, which can be installed by executing the following command line:

> npm install rdfstore

How to do it...

  1. We create a file rdf-client.js with the following code:

    var rdfstore = require('rdfstore');
    var request = require('request');
    var fs = require('fs');
    rdfstore.create(function(store) {
    function load(files, callback) {
    var filesToLoad = files.length;
    for (var i = 0; i < files.length; i++) {
    var file = files[i]
    fs.createReadStream(file).pipe(
    request.post( {
    url: 'http://localhost:8080/enhancer?uri=file:
    ///' + file,
    headers: {accept: "text/turtle"}
    },
    function(error, response, body) {
    if (!error && response.statusCode == 200) {
    store.load(
    "text/turtle",
    body,
    function(success, results) {
    console.log('loaded: ' + results + " triples
    from file" + file);
    if (--filesToLoad === 0) {
    callback()
    }
    }
    );
    }
    else {
    console.log('Got status code: ' +
    response.statusCode);
    }
    }));
    }
    }
    load(['testdata.txt', 'testdata2.txt'], function() {
    store.execute(
    "PREFIX enhancer:<http://fise.iks-project.
    eu/ontology/> \
    PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> \
    SELECT ?label ?source { \
    ?a enhancer:extracted-from ?source. \
    ?a enhancer:entity-reference ?e. \
    ?e rdfs:label ?label.\
    FILTER (lang(?label) = \"en\") \
    }",
    function(success, results) {
    if (success) {
    console.log("*******************");
    for (var i = 0; i < results.length; i++) {
    console.log(results[i].label.value +
    " in " + results[i].source.value);
    }
    }
    });
    });
    });

  2. Create the data files:

    Our client loads two files. We use a simple testdata.txt file having the content:

    "The Stanbol enhancer can detect famous cities such as Paris and people such as
    Bob Marley."

    And a second testdata2.txt file with the following content:

    "Bob Marley never had a concert in Vatican City."

  3. We execute the code using Node.js command line:

    > node rdf-client.js

    The output is:

    loaded: 159 triples from file testdata2.txt
    loaded: 140 triples from file testdata2.txt
    *******************
    Vatican City in file:///testdata2.txt
    Bob Marley in file:///testdata2.txt
    Bob Marley in file:///testdata.txt
    Paris, Texas in file:///testdata.txt
    Paris in file:///testdata.txt

  4. This time we see the labels of the entities and the file in which they appear.

How it works…

Unlike the usual clients, this client no longer analyses the returned JavaScript Object Notation (JSON) but processes the returned data as RDF. An RDF document is a directed graph. The following screenshot shows some RDF rendered as graph by the W3C

We can create such an image by selecting RDF/XML as the output format on localhost:8080/enhancer , copying and pasting the XML generated, and running the engines on some text to www.w3.org/RDF/Validator/ , where we can request that triples and graphs be generated from it. Triples are the other way to look at RDF. An RDF graph (or document) is a set of triples of the form– subject-predicate-object, where subject and object are the nodes (vertices) and predicate is the arc (edge). Every triple is a statement describing a property of its subject:

<urn:enhancement-f488d7ce-a1b7-faa6-0582-0826854eab5e> <http://fise.
iks-project.eu/ontology/entity-reference> <http://dbpedia.org/resource/
Bob_Marley>.
<http://dbpedia.org/resource/Bob_Marley>
<http://www.w3.org/2000/01/rdf-schema#label> "Bob Marley"@en .

There are two triples saying that an enhancement referenced Bob Marley and that the English label for Bob Marley is "Bob Marley". All the arches and most of the nodes are labeled by an Internationalized Resource Identifier (IRI), which defines a superset of the good old URLs including non-Latin characters.

RDF can be serialized in many different formats. The two triples in the preceding command lines use the N-TRIPLES syntax. RDF/XML expresses (serializes) RDF graphs as XML documents. Originally, RDF/XML was referred to as the canonical serialization for RDF. Unfortunately, this caused some people to believe RDF would be somehow related to XML and thus inherit its flaws. A serialization format designed specifically for RDF that doesn't encode RDF into an existing format is Turtle. Turtle allows both explicit listing of triples as in N-TRIPLES but also supports various ways of expressing the graphs in a more concise and readable fashion. The JSON-LD, expresses RDF graphs in JSON. As this specification is currently still work in progress (see json-ld.org/), different implementations are incompatible and thus, for this example, we switched the Accept-Header to text/turtle.

Another change in the code performing the request is that we added a uri query-parameter to the requested URL:

'http://localhost:8080/enhancer?uri=file:///' + file,

 

This defines the IRI naming used as a name for the uploaded content in the result graph. If this parameter is not specified, the enhancer will generate an IRI which is based on creating a hash of the content. But this line in the output would be less helpful:

Paris in urn:content-item-sha1-3b16820497aae806f289419d541c770bbf87a796

Roughly the first half of our code takes care of sending the files to Stanbol and storing the returned RDF. We define a function load that asynchronously enhances a bunch of files and invokes a callback function when all files have successfully been loaded.

The second half of the code is the function that's executed once all files have been processed. At this point, we have all the triples loaded in the store. We could now programmatically access the triples one by one, but it's easier to just query for the data we're interested in. SPARQL is a query language a bit similar to SQL but designed to query triple stores rather than relational databases. In our program, we have the following query (slightly simplified here):

PREFIX enhancer:<http://fise.iks-project.eu/ontology/>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT ?label ?source {
?a enhancer:extracted-from ?source.
?a enhancer:entity-reference ?e.
?e rdfs:label ?label. }

The most important part is the section between curly brackets. This is a graph pattern that is like a graph, but with some variables instead of values. On execution, the SPARQL engine will check for parts of the RDF matching this pattern and return a table with a row for each selected value and a row for every matching value combination. In our case, we iterate through the result and output the label of the entity and the document in which the entity was referenced.

There's more...

The advantage of RDF is that many tools can deal with the data, ranging from command line tools such as rapper (librdf.org/raptor/rapper.html) for converting data to server applications, which allow to store large amounts of RDF data and build applications on top of it.

Summary

In this recipe, the advantage of using RDF (model-based) over the conventional JSON (syntax-based)method is explained. In the article, a client was created, rdf-client.js, which loaded two files, testdata.txt and testdata2.txt, and then were executed using Node.js command prompt. An RDF was rendered using W3C in the form of triples. Later, using SPARQL the triples were queried to extract the required information.

Resources for Article :


Further resources on this subject:


Instant Apache Stanbol [Instant] Learn how to deploy Stanbol to extend content management with semantic serviceswith this book and ebook
Published: July 2013
eBook Price: $14.99
See more
Select your format and quantity:

About the Author :


Reto Bachmann-Gmür

Reto Bachmann-Gmür, first learned about the Web in 1996, and believed it to be the perfect opportunity to combine his passion for technology with his desire to do something socially relevant. However, he got buried under information overload of the Web, and hardly got a chance to breathe until he finally heard about the semantic Web in 2002. He started development on various open source projects that were designed as better means to deal with the ever-growing information, ideally of course in a decentralized fashion without any position, and which were able to control the free flow of information. He worked on Graph Versioning with the Jena team at HP Laboratories, and on Semantic Content Management with Adobe. Since he hasn't fixed the world yet, he's increasingly trying to get others to do the job, partially with his consulting firm wymiwyg.com and partially fathering his 10-year-old son, ultimately spending less time fixing and more time enjoying.

He's a member of the Apache foundation and is active in various Semantic Web-related Apache projects.

Books From Packt


Instant Apache ServiceMix How-to [Instant]
Instant Apache ServiceMix How-to [Instant]

Instant Apache Maven Starter [Instant]
Instant Apache Maven Starter [Instant]

Apache CloudStack Cloud Computing
Apache CloudStack Cloud Computing

Apache Solr 4 Cookbook
Apache Solr 4 Cookbook

Apache JMeter
Apache JMeter

Apache Wicket Cookbook
Apache Wicket Cookbook

Apache Maven 3 Cookbook
Apache Maven 3 Cookbook

Apache OfBiz Cookbook
Apache OfBiz Cookbook


Your rating: None Average: 1.5 (2 votes)

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
K
W
3
U
G
n
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software