Lecture 11: Integration

In this lecture, we will look more in depth on data integration, different methods for data integration, as well as how the technologies we have used can be used to integrate different datasets.

Slides

Relevant Wiki-pages

Exercises

As described in the mandatory assignment text, the city of Urb needed to integrate data from different sources, in particular the databases from Suburbia and Urbania. In the mandatory assignment, you integrated their data using direct mappings from Triplelore and a mix of ontology axioms in OWL and Lore-rules.

In other parallel universes, the city council of Urb decides on different technologies, and in these exercises we will look at some of these, namely Ontop and PostgreSQL’s foreign data wrappers.

Exercise 1: Ontop

In the first parallel universe, the city council decides that Ontop will be used for the integration. Start by downloading the following Makefile.

As usual, execute

make setup

to make download necessary programs and data and make folders. The Makefile downloads Ontop for you. Update the DB-credentials in the Makefile and execute

make relational_dbs

to load the data into your database. Note that the data is the same as for the oblig, but now also loads the data from data/organizations.csv into a table suburbia.organization.

Now, make Ontop-mappings that map the data into the urb-vocabulary described in the oblig text. You need to make the following mappings in mappings/ontop_mappings.obda (remember to make unique IRIs for every entity):

Map rows from suburbia.citizen to instances of urb:Person (note that you need separate mappings for assigning urb:worksAt-relationships (e.g. occupation = "farmer" should give urb:worksAt-relation to the (IRI of) Docks))
Map rows from suburbia.home to instances of urb:Building
Map rows from suburbia.organization to instances of urb:Organization
Map rows from urbania.citizen to instances of urb:Person (you do not need to map apartment_number)
Map rows from urbania.building to instances of urb:Building
Map rows from urbania.organization to instances of urb:Organization

Finally, make an ontology in semantics/urb_global.ttl that states the following aditional information:

urb:contains should be the inverse of urb:locatedAt
urb:hasEmployee should be the inverse of urb:worksAt
urb:worksAt has range urb:Organization

You can now use your mappings by making a SPARQL-query in queries/test.sparql and run

make test_query

You can e.g. try to run:

SELECT *
FROM { ?s ?p ?o . }

to get all triples, or query for more specific data. Note that we also get triples derived from the ontology, such as urb:contains-relationships and that farmers are urb:Organizations.

Exercise 2: Foreign Data Wrappers

In another parallel universe, the city council would like to explore the possibility of using foreign data wrappers. Both Suburbia’s data and Urbania’s data resides in each (PostgreSQL) database, and they want you to make a foreign data wrapper of Suburbia’s data into Urbania’s data, and then make a schema urb that integrates both of the databases under VIEWs with the following columns:

urb.building(bid, address, zipcode)
urb.organization(oid, name, offices_in)
urb.person(urb_id, name, lives_in, works_at)

where urb.building(bid), urb.organization(oid) and urb.person(urb_id) are unique, and urb.person(lives_in) references urb.building(bid), urb.person(works_at) references urb.organization(oid), and urb.organization(offices_in) references urb.building(bid).

Use the following Makefile and execute make setup. You now need to fill in credentials for two databases: Use your normal database for Urbania (i.e. all of the variables that starts with db_u_, and use the remote-DB (has a name of the form in5800_<username>_remote, you should have gotten an email about this) for the Suburbian DB (the variables starting with db_s). The Makefile loads the Suburbia-schema into the remote-DB (including the suburbia.organization-table), and the Urbania-schema into your normal DB.

Now, make the foreign data wrapper script in the file mappings/fdw.sql that maps the suburbia-schema from the remote-DB into your normal DB.

Then, make GAV-mappings into the VIEWs described above. Note that you also here need to make unique keys (i.e. bid, oid and urb_id) and make corresponding foreign keys.

Solution

The solution can be seen here