Solution to exercises on query languages

SQL and SPARQL

Use the dataset on species, and for each of the exercises below, write both an SQL-query and a SPARQL-query:

  1. Find the genus of all species locally extinct (i.e. having local_conservation equal to 0.

SQL:

SELECT genus
FROM species.description
WHERE local_conservation = 0;

SPARQL:

PREFIX exd: <http://example.org/species/desc/> 

SELECT ?genus
WHERE {
  ?s exd:genus ?genus ;
     exd:local_conservation 0 .
}
  1. Find the genus and common name (if it exists) for all species having a local_conservation smaller than global_conservation. Order the result on genus alphabetically.

SQL:

SELECT genus, common_name
FROM species.description
WHERE local_conservation < global_conservation;

SPARQL:

PREFIX exd: <http://example.org/species/desc/> 

SELECT ?genus ?common_name
WHERE {
  ?s exd:genus ?genus ;
     exd:local_conservation ?lc ;
     exd:global_conservation ?gc .
  OPTIONAL {   
     ?s exd:common_name ?common_name .
  }
  FILTER (?lc < ?gc) 
}
  1. Find out how many observations there are for species in each of the global conservation statuses (e.g. how many observations are there of species with conservation status 0, how many for those with conservation status 1, and so on). Hint: Use grouping.

SQL:

SELECT d.global_conservation, count(o.species)
FROM species.description AS d
     LEFT JOIN species.observation AS o ON (d.sid = o.species)
GROUP BY d.global_conservation;

SPARQL:

PREFIX exd: <http://example.org/species/desc/> 
PREFIX exo: <http://example.org/species/obs/> 

SELECT ?gc (COUNT(?obs) AS ?number)
WHERE {
  ?s exd:global_conservation ?gc .
  OPTIONAL {
    ?obs exo:species ?s .
  }
}
GROUP BY ?gc
  1. Find the name (i.e. the rdfs:label) of all mothers.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX exo: <http://example.org/ont/> 

SELECT ?name 
WHERE {
  ?p exo:hasMother/rdfs:label ?name .
}
  1. Find find the name of all parents (i.e. both mothers and fathers)
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX exo: <http://example.org/ont/> 

SELECT ?name 
WHERE {
  ?p (exo:hasMother|exo:hasFather)/rdfs:label ?name .
}
  1. Find the names of all ancestors of ex:bob
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
PREFIX exo: <http://example.org/ont/> 
PREFIX exr: <http://example.org/res/> 

SELECT ?name 
WHERE {
  ?bob rdfs:label "Bob" ;
       (exo:hasMother|exo:hasFather)+/rdfs:label ?name .
}

Issues with SQL

WITH RECURSIVE
  parents AS (                         -- Need to merge hasMother and hasFather into hasParent,
    SELECT id, hasMother AS hasParent  -- as a table one recurse on can only occur once
    FROM people
    UNION ALL
    SELECT id, hasFather AS hasParent
    FROM people
  ),
  ancestors_ids AS (
    SELECT pa.id, pa.hasParent
    FROM people AS pe JOIN parents AS pa USING (id)
    WHERE pe.name = 'Bob'
    UNION ALL
    SELECT pa.id, pa.hasParent
    FROM ancestors_ids AS ai JOIN parents AS pa ON (ai.hasParent = pa.id)
  )
SELECT pe.name
FROM ancestors_ids AS ai JOIN 
     people AS pe ON (ai.hasParent = pe.id);

This query is quite complex. It is also not very efficient for large family trees. So even though most or even every type of statement can be represented with relations/tables, this is still not always a good idea.

Issues with SPARQL

1

It is not too dificult to query lists (and other complex/nested structures) using property-paths:

PREFIX ex: <http://example.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?n
WHERE { ex:peter ex:knowsTheNumbers/(rdf:rest*/rdf:first) ?n . }

The property path rdf:rest*/rdf:first matches all the paths of the form rdf:first, rdf:rest/rdf:first, rdf:rest/rdf:rest/rdf:first, and so on.

2

The graph-patterns in SPARQL can only match agains a given (final) number of variables. Since lists are represented as triples and can be arbitrary long and arbitrarily many, it is not possible to make a SPARQL-query that can concatenate two lists, as we would need to bind each element in the list to a variable to create it in the CONSTRUCT-clause. If we knew that the lists had a specific length, e.g. 2, and a specific number of lists per person, e.g. 2, we could make the following query:

PREFIX ex: <http://example.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

CONSTRUCT {
    ?person ex:knowsTheNumbers
        [ rdf:first ?e1 ;
          rdf:rest [ rdf:first ?e2 ;
                     rdf:rest [ rdf:first ?e3 ;
                                rdf:rest [ rdf:first ?e4 ;
                                           rdf:rest rdf:nil ] ] ] ] .
}
WHERE {
    ?person ex:knowsTheNumbers ?l1 .
    ?l1 rdf:first ?e1 ;
         rdf:rest [ rdf:first ?e2 ;
                    rdf:rest rdf:nil ] .
    ?person ex:knowsTheNumbers ?l2 .
    ?l2 rdf:first ?e3 ;
         rdf:rest [ rdf:first ?e4 ;
                    rdf:rest rdf:nil ] .
    FILTER (?e1 < ?e3) # First numbers are unequal => lists unequal and ordering removes permutations
}

Similarly, if we knew that all lists were shorter than a given number n, we could write a (big) union of similar queries as above (assuming only 2 lists per person). But, since we do not know the maximum length of a lists, nor how many lists each person is related to, it becomes impossible.

One could argue that one could simply introduce functions that operate on lists (such as in most other query languages), but remember that lists are represented via a collection of triples and not as a single value, which makes implementing such a function a bit more complex. Furthermore, such a function would not be applicable to other complex or nested structures, so one function per type of structure would be needed. A general solution could be to somehow be able to group and aggregate over complex/nested structures, but this is currently not possible in SPARQL.