Lecture 11: Integration
In this lecture, we will look more in depth on data integration, different methods for data integration, as well as how the technologies we have used can be used to integrate different datasets.
Relevant Wiki-pages
Exercises
As described in the mandatory assignment text, the city of Urb needed to integrate data from different sources, in particular the databases from Suburbia and Urbania. In the mandatory assignment, you integrated their data using direct mappings from Triplelore and a mix of ontology axioms in OWL and Lore-rules.
In other parallel universes, the city council of Urb decides on different technologies, and in these exercises we will look at some of these, namely Ontop and PostgreSQL’s foreign data wrappers.
Exercise 1: Ontop
In the first parallel universe, the city council decides that Ontop will be used for the integration. Start by downloading the following Makefile.
As usual, execute
make setup
to make download necessary programs and data and make folders. The
Makefile downloads Ontop for you. Update the
DB-credentials in the Makefile and execute
make relational_dbs
to load the data into your database. Note that the data is the same
as for the oblig, but now also loads the data from
data/organizations.csv into a table
suburbia.organization.
Now, make Ontop-mappings that map the data into the
urb-vocabulary described in the
oblig text. You need to make the following mappings in
mappings/ontop_mappings.obda (remember to make unique IRIs
for every entity):
- Map rows from
suburbia.citizento instances ofurb:Person(note that you need separate mappings for assigningurb:worksAt-relationships (e.g.occupation = "farmer"should giveurb:worksAt-relation to the (IRI of) Docks)) - Map rows from
suburbia.hometo instances ofurb:Building - Map rows from
suburbia.organizationto instances ofurb:Organization - Map rows from
urbania.citizento instances ofurb:Person(you do not need to mapapartment_number) - Map rows from
urbania.buildingto instances ofurb:Building - Map rows from
urbania.organizationto instances ofurb:Organization
Finally, make an ontology in semantics/urb_global.ttl
that states the following aditional information:
urb:containsshould be the inverse ofurb:locatedAturb:hasEmployeeshould be the inverse ofurb:worksAturb:worksAthas rangeurb:Organization
You can now use your mappings by making a SPARQL-query in
queries/test.sparql and run
make test_query
You can e.g. try to run:
SELECT *
FROM { ?s ?p ?o . }to get all triples, or query for more specific data. Note that we
also get triples derived from the ontology, such as
urb:contains-relationships and that farmers are
urb:Organizations.
Exercise 2: Foreign Data Wrappers
In another parallel universe, the city council would like to explore
the possibility of using foreign data wrappers. Both Suburbia’s data and
Urbania’s data resides in each (PostgreSQL) database, and they want you
to make a foreign data wrapper of Suburbia’s data into Urbania’s data,
and then make a schema urb that integrates both of the
databases under VIEWs with the following columns:
urb.building(bid, address, zipcode)
urb.organization(oid, name, offices_in)
urb.person(urb_id, name, lives_in, works_at)where urb.building(bid),
urb.organization(oid) and urb.person(urb_id)
are unique, and urb.person(lives_in) references
urb.building(bid), urb.person(works_at)
references urb.organization(oid), and
urb.organization(offices_in) references
urb.building(bid).
Use the following Makefile
and execute make setup. You now need to fill in credentials
for two databases: Use your normal database for Urbania (i.e. all of the
variables that starts with db_u_, and use the remote-DB
(has a name of the form in5800_<username>_remote, you
should have gotten an email about this) for the Suburbian DB (the
variables starting with db_s). The Makefile
loads the Suburbia-schema into the remote-DB (including the
suburbia.organization-table), and the Urbania-schema into
your normal DB.
Now, make the foreign data wrapper script in the file
mappings/fdw.sql that maps the suburbia-schema
from the remote-DB into your normal DB.
Then, make GAV-mappings into the VIEWs described above.
Note that you also here need to make unique keys (i.e. bid,
oid and urb_id) and make corresponding foreign
keys.
Solution
The solution can be seen here