Lecture 11: Integration
In this lecture, we will look more in depth on data integration, different methods for data integration, as well as how the technologies we have used can be used to integrate different datasets.
Relevant Wiki-pages
Exercises
As described in the mandatory assignment text, the city of Urb needed to integrate data from different sources, in particular the databases from Suburbia and Urbania. In the mandatory assignment, you integrated their data using direct mappings from Triplelore and a mix of ontology axioms in OWL and Lore-rules.
In other parallel universes, the city council of Urb decides on different technologies, and in these exercises we will look at some of these, namely Ontop and PostgreSQL’s foreign data wrappers.
Exercise 1: Ontop
In the first parallel universe, the city council decides that Ontop will be used for the integration. Start by downloading the following Makefile.
As usual, execute
make setup
to make download necessary programs and data and make folders. The
Makefile
downloads Ontop
for you. Update the
DB-credentials in the Makefile
and execute
make relational_dbs
to load the data into your database. Note that the data is the same
as for the oblig, but now also loads the data from
data/organizations.csv
into a table
suburbia.organization
.
Now, make Ontop-mappings that map the data into the
urb
-vocabulary described in the
oblig text. You need to make the following mappings in
mappings/ontop_mappings.obda
(remember to make unique IRIs
for every entity):
- Map rows from
suburbia.citizen
to instances ofurb:Person
(note that you need separate mappings for assigningurb:worksAt
-relationships (e.g.occupation = "farmer"
should giveurb:worksAt
-relation to the (IRI of) Docks)) - Map rows from
suburbia.home
to instances ofurb:Building
- Map rows from
suburbia.organization
to instances ofurb:Organization
- Map rows from
urbania.citizen
to instances ofurb:Person
(you do not need to mapapartment_number
) - Map rows from
urbania.building
to instances ofurb:Building
- Map rows from
urbania.organization
to instances ofurb:Organization
Finally, make an ontology in semantics/urb_global.ttl
that states the following aditional information:
urb:contains
should be the inverse ofurb:locatedAt
urb:hasEmployee
should be the inverse ofurb:worksAt
urb:worksAt
has rangeurb:Organization
You can now use your mappings by making a SPARQL-query in
queries/test.sparql
and run
make test_query
You can e.g. try to run:
SELECT *
FROM { ?s ?p ?o . }
to get all triples, or query for more specific data. Note that we
also get triples derived from the ontology, such as
urb:contains
-relationships and that farmers are
urb:Organization
s.
Exercise 2: Foreign Data Wrappers
In another parallel universe, the city council would like to explore
the possibility of using foreign data wrappers. Both Suburbia’s data and
Urbania’s data resides in each (PostgreSQL) database, and they want you
to make a foreign data wrapper of Suburbia’s data into Urbania’s data,
and then make a schema urb
that integrates both of the
databases under VIEW
s with the following columns:
urb.building(bid, address, zipcode)organization(oid, name, offices_in)
urb. urb.person(urb_id, name, lives_in, works_at)
where urb.building(bid)
,
urb.organization(oid)
and urb.person(urb_id)
are unique, and urb.person(lives_in)
references
urb.building(bid)
, urb.person(works_at)
references urb.organization(oid)
, and
urb.organization(offices_in)
references
urb.building(bid)
.
Use the following Makefile
and execute make setup
. You now need to fill in credentials
for two databases: Use your normal database for Urbania (i.e. all of the
variables that starts with db_u_
, and use the remote-DB
(has a name of the form in5800_<username>_remote
, you
should have gotten an email about this) for the Suburbian DB (the
variables starting with db_s
). The Makefile
loads the Suburbia-schema into the remote-DB (including the
suburbia.organization
-table), and the Urbania-schema into
your normal DB.
Now, make the foreign data wrapper script in the file
mappings/fdw.sql
that maps the suburbia
-schema
from the remote-DB into your normal DB.
Then, make GAV-mappings into the VIEW
s described above.
Note that you also here need to make unique keys (i.e. bid
,
oid
and urb_id
) and make corresponding foreign
keys.
Solution
The solution can be seen here