Lecture 3: Query Languages

This lecture gives an overview of different query languages, in particular SQL and SPARQL, and how they differ. We will also look at the relationship between a query language and the underlying data structure it queries.

Slides

Relevant Wiki-pages

Exercises

Set-up

  1. For SQL, use your personal database, either the one set up by USiT for you, or a database on your personal computer. You should have gotten a mail with details on your personal database at IFI.
  2. Download the example dataset on species used in the lecture here, and use psql to execute the script.
  3. For SPARQL, use e.g. Jena (we will use a different triplestore later in the course)
  4. Download the example dataset on species used in the lecture here
  5. Download the example RDF-dataset on people used in the lecture here and
  6. To execute a SPARQL-query, simply run the following command:
<path-to-jena>/bin/arq --data <RDF-file> --query <query-file>

where <path-to-jena> is the path to the unzipped Jena-folder, <RDF-file> is the file containing the RDF-data you want to query, and <query-file> is a file containing the SPARQL-query you want to execute.

SQL and SPARQL

Use the dataset on species, and for each of the exercises below, write both an SQL-query and a SPARQL-query:

  1. Find the genus of all species locally extinct (i.e. having local_conservation equal to 0.
  2. Find the genus and common name (if it exists) for all species having a local_conservation smaller than global_conservation. Order the result on genus alphabetically.
  3. Find out how many observations there are for species in each of the global conservation statuses (e.g. how many observations are there of species with conservation status 0, how many for those with conservation status 1, and so on). Hint: Use grouping.

Use the dataset on people, and for each of the exercises below, write a SPARQL-query:

  1. Find the name (i.e. the rdfs:label) of all mothers.
  2. Find find the name of all parents (i.e. both mothers and fathers)
  3. Find the names of all ancestors of ex:bob (You might get a ClassCastException when trying to execute your solutio on this, see the solution for more info on this)

Issues with SQL

An important point with the relational model is that most (all?) data can be represented as relations. However, this does not mean that it is convenient to manipulate by the query languages working on tables, such as SQL.

So, now imagine that we had represented the data in people.ttl in the following relational table:

CREATE TABLE people (
  id int PRIMARY KEY,
  name text,
  hasFather int REFERENCES people(id),
  hasMother int REFERENCES people(id)
);

where we replace IRIs of resources with integer ids. The people-table would then look as follows:

 id | name  | hasmother | hasfather 
----+-------+-----------+-----------
  1 | Blank |           |          
  2 | Peter |           |          
  3 | Mina  |           |          
  4 | Ole   |           |          
  5 | Maria |           |          
  6 | Karl  |         3 |         2
  7 | Mary  |         5 |         4
  8 | Bob   |         7 |         6

The SQL-script making this table can be downloaded here.

How would an SQL-query that finds the name of all ancestors of the person with name Ole look? (Note: difficult, need recursive queries!)

Issues with SPARQL

An important feature of RDF, similarly as with the relational model, is that most data (any?) can be represented as a collection of triples. This is equally true for complex/nested structures, such as trees, lists, and other complex data types. For example, to state that ex:peter is ex:knowsTheNumbers-related to a (linked) list with the elements 1, 2 and 3 can be represented in RDF with the following triples:

ex:peter ex:knowsTheNumbers 
  [rdf:first 1 ;
   rdf:rest [ rdf:first 2 ;
              rdf:rest [ rdf:first 3 ;
                         rdf:rest rdf:nil ] ] .

where rdf:first denotes the first element of the list, rdf:rest denotes the list of elements comming after the first (recursively), and rdf:nil is the empty list. Note that in the Turtle serialization there is a short hand syntax (syntactic sugar) for writing such lists (occuring within a triple), so one could rather write

ex:peter ex:knowsTheNumbers (1 2 3) .

in place of the triples above.

Lists are not that much used in RDF in practise, as the above statement could just as easily be represented as

ex:peter ex:knowsTheNumber 1, 2, 3 .

That is, with three triples, one for each number. However, there are some benefits to using lists (or similar structures), e.g. lists are closed, so one cannot just add statements to exted a list (one would need to delete the rdf:rest rdf:nil-triple, and replace it with the desired extension, one knows when one has read all triples describing a list (if one have triples connecting ex:peter to rdf:nil, we know we have read all triples describing the list), and finally, a list imposes an ordering of the elements (e.g. it makes sense that Peter learned 1 before 2). Similar arguments can be made for other (more domain specific) complex structures. Thus, the problems discussed in this exercise are still relevant.

So, imagine we had a graph that keeps track of the numbers different people know (i.e. can count to), but when a person learns even more numbers, new statements are simply added to the graph. So, we might have data that looks like this (this file can be downloaded here):

@prefix ex: <http://example.org/> .

ex:peter ex:knowsTheNumbers (1 2 3) .
ex:kari ex:knowsTheNumbers (1 2 3 4 5) .
ex:peter ex:knowsTheNumbers (4 5 6 7) .
ex:ole ex:knowsTheNumbers (1 2) .
ex:kari ex:knowsTheNumbers (6 7 8 9 10) .
ex:peter ex:knowsTheNumbers (8 9) .
  1. Write a SPARQL-query that finds all the numbers that ex:peter knows. (Hint: Use property paths)
  2. Write a SPARQL-query that creates a new graph (i.e. a CONSTRUCT-query) that contains only one list per person that contains all the numbers that person knows (i.e. concatenate all the lists). (Hint: Impossible! But try to find out why)

Solution

The solution to the exercises can be found here.