Solution to exercises on query languages
SQL and SPARQL
Use the dataset on species, and for each of the exercises below, write both an SQL-query and a SPARQL-query:
- Find the genus of all species locally extinct (i.e. having
local_conservation
equal to0
.
SQL:
SELECT genus
FROM species.description
WHERE local_conservation = 0;
SPARQL:
<http://example.org/species/desc/>
PREFIX exd:
SELECT ?genus
WHERE {
:genus ?genus ;
?s exd:local_conservation 0 .
exd }
- Find the genus and common name (if it exists) for all species having
a
local_conservation
smaller thanglobal_conservation
. Order the result on genus alphabetically.
SQL:
SELECT genus, common_name
FROM species.description
WHERE local_conservation < global_conservation;
SPARQL:
<http://example.org/species/desc/>
PREFIX exd:
SELECT ?genus ?common_name
WHERE {
:genus ?genus ;
?s exd:local_conservation ?lc ;
exd:global_conservation ?gc .
exd
OPTIONAL { :common_name ?common_name .
?s exd
}FILTER (?lc < ?gc)
}
- Find out how many observations there are for species in each of the
global conservation statuses (e.g. how many observations are there of
species with conservation status
0
, how many for those with conservation status1
, and so on). Hint: Use grouping.
SQL:
SELECT d.global_conservation, count(o.species)
FROM species.description AS d
LEFT JOIN species.observation AS o ON (d.sid = o.species)
GROUP BY d.global_conservation;
SPARQL:
<http://example.org/species/desc/>
PREFIX exd: <http://example.org/species/obs/>
PREFIX exo:
SELECT ?gc (COUNT(?obs) AS ?number)
WHERE {
:global_conservation ?gc .
?s exd
OPTIONAL {:species ?s .
?obs exo
}
}GROUP BY ?gc
- Find the name (i.e. the
rdfs:label
) of all mothers.
<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdfs: <http://example.org/ont/>
PREFIX exo:
SELECT ?name
WHERE {
:hasMother/rdfs:label ?name .
?p exo }
- Find find the name of all parents (i.e. both mothers and fathers)
<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdfs: <http://example.org/ont/>
PREFIX exo:
SELECT ?name
WHERE {
:hasMother|exo:hasFather)/rdfs:label ?name .
?p (exo }
- Find the names of all ancestors of
ex:bob
<http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdfs: <http://example.org/ont/>
PREFIX exo: <http://example.org/res/>
PREFIX exr:
SELECT ?name
WHERE {
:label "Bob" ;
?bob rdfs:hasMother|exo:hasFather)+/rdfs:label ?name .
(exo }
Issues with SQL
WITH RECURSIVE
AS ( -- Need to merge hasMother and hasFather into hasParent,
parents SELECT id, hasMother AS hasParent -- as a table one recurse on can only occur once
FROM people
UNION ALL
SELECT id, hasFather AS hasParent
FROM people
),AS (
ancestors_ids SELECT pa.id, pa.hasParent
FROM people AS pe JOIN parents AS pa USING (id)
WHERE pe.name = 'Bob'
UNION ALL
SELECT pa.id, pa.hasParent
FROM ancestors_ids AS ai JOIN parents AS pa ON (ai.hasParent = pa.id)
)SELECT pe.name
FROM ancestors_ids AS ai JOIN
AS pe ON (ai.hasParent = pe.id); people
This query is quite complex. It is also not very efficient for large family trees. So even though most or even every type of statement can be represented with relations/tables, this is still not always a good idea.
Issues with SPARQL
1
It is not too dificult to query lists (and other complex/nested structures) using property-paths:
PREFIX ex: <http://example.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?n
WHERE { ex:peter ex:knowsTheNumbers/(rdf:rest*/rdf:first) ?n . }
The property path rdf:rest*/rdf:first
matches all the
paths of the form rdf:first
,
rdf:rest/rdf:first
,
rdf:rest/rdf:rest/rdf:first
, and so on.
2
The graph-patterns in SPARQL can only match agains a given (final)
number of variables. Since lists are represented as triples and can be
arbitrary long and arbitrarily many, it is not possible to make a
SPARQL-query that can concatenate two lists, as we would need to bind
each element in the list to a variable to create it in the
CONSTRUCT
-clause. If we knew that the lists had a specific
length, e.g. 2, and a specific number of lists per person, e.g. 2, we
could make the following query:
PREFIX ex: <http://example.org/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
CONSTRUCT {
?person ex:knowsTheNumbers
[ rdf:first ?e1 ;
rdf:rest [ rdf:first ?e2 ;
rdf:rest [ rdf:first ?e3 ;
rdf:rest [ rdf:first ?e4 ;
rdf:rest rdf:nil ] ] ] ] .
}
WHERE {
?person ex:knowsTheNumbers ?l1 .
?l1 rdf:first ?e1 ;
rdf:rest [ rdf:first ?e2 ;
rdf:rest rdf:nil ] .
?person ex:knowsTheNumbers ?l2 .
?l2 rdf:first ?e3 ;
rdf:rest [ rdf:first ?e4 ;
rdf:rest rdf:nil ] .
FILTER (?e1 < ?e3) # First numbers are unequal => lists unequal and ordering removes permutations
}
Similarly, if we knew that all lists were shorter than a given number
n
, we could write a (big) union of similar queries as above
(assuming only 2 lists per person). But, since we do not know the
maximum length of a lists, nor how many lists each person is related to,
it becomes impossible.
One could argue that one could simply introduce functions that operate on lists (such as in most other query languages), but remember that lists are represented via a collection of triples and not as a single value, which makes implementing such a function a bit more complex. Furthermore, such a function would not be applicable to other complex or nested structures, so one function per type of structure would be needed. A general solution could be to somehow be able to group and aggregate over complex/nested structures, but this is currently not possible in SPARQL.