IN5800 – Mandatory Assignment

The city of Urb has gathered quite a lot of data about its citizens and organizations. However, the city has two city parts, Suburbia and Urbania. Urbania is the city’s center with tall buildings, busy streets and lots of people. Suburbia is the suburban part of the city, with farms, parks and villa houses. The two parts are quite different, leading to them having rather different data needs and therefore also different structure and content in their databases.

For city planning and other forms of city management, the city council of Urb now realizes that they need to integrate their data into a common database. They have managed to get most of their data into the same PostgreSQL database. Suburbia’s data is in the schema with name suburbia, and Urbania’s data is in the schema with name urbania. However, as noted above, the structure and content of these two schemas are too different to be merged directly. Thus, a more complex data integration job is needed and you are hired to do this task.

The City of Urb’s Data

Suburbia

Suburbia has a database with the following tables (described via the SQL-commands creating the schema):

CREATE TABLE suburbia.home(
    hid int PRIMARY KEY,
    home_address text NOT NULL,
    zip_code text NOT NULL
);

CREATE TABLE suburbia.citizen(
    cid int PRIMARY KEY,
    urb_id text UNIQUE,
    name text NOT NULL,
    home int REFERENCES suburbia.home(hid),
    occupation text -- farmer, fisher, student, pupil
);

So suburbia.home contains addresses and zip codes for all houses in the city part, and suburbia.citizen contains each person’s Urb ID (an identification string given uniquely to each citizen of the entire city of Urb) each person’s name, a reference to the house that is each person’s home, a string containing the each person’s occupation (only farmers, fishers, students and pupils live in Suburbia). If the occupation column is empty (i.e. NULL), this means that the person is unemployed.

Urbania

Urbania has a somewhat more complex schema, with the following tables (again, described via the SQL-commands creating the schema):

CREATE TABLE urbania.building(
    bid int PRIMARY KEY,
    address text NOT NULL,
    zipcode text NOT NULL
);

CREATE TABLE urbania.organization(
    oid int PRIMARY KEY,
    name text NOT NULL,
    offices_in int REFERENCES urbania.building(bid)
);

CREATE TABLE urbania.citizen(
    cid int PRIMARY KEY,
    urb_id text UNIQUE,
    name text NOT NULL,
    apartment_number int NOT NULL,
    apartment_building int REFERENCES urbania.building(bid),
    works_at int REFERENCES urbania.organization(oid)
);

So urbania.building contains addresses and zip codes for all buildings in the city part; urbania.organization contains the name of all organizations (companies, school, university, etc.) and a reference to the building that the organization has offices in; and finally urbania.citizen contains each person’s Urb ID (an identification string given uniquely to each citizen of the entire city of Urb) each person’s name, a reference to the apartment building that each person lives in together with the apartment number, and a reference to the organization that the person is working at. If works_at is empty (i.e. NULL), then the person is unemployed. Note that in this schema, a student works_at the university, and a pupil works_at the school.

Setup

Make a new private Git-repo at UiO’s Github, and check it out to either your local computer (note: you need a working PostgreSQL server running and Java 11 on you local machine to solve the oblig) or at an IFI Linux-machine (you can also access the IFI-machines remotely, see this page (Norwegian) or this page (English) for remote access).

Download this Makefile, and store it in the repo you just made. If you are logged inn to a remote IFI-machine, you can download the file by executing

wget http://leifhka.org/in5800/oblig/Makefile

Open the Makefile in your favorite text editor and update the variables db_name, db_user and db_password with the name of your PostgreSQL database, and the username and password for this database. If you are on an IFI-machine (either physically or remote) comment out line 8 (with a # at the start) and inn (by removing the first #) line 9 to set the correct Java version. Use

git add Makefile

to add the file to the repo, and finally

git commit -m "Init repo."
git push origin master

to commit and push the changes (give your UiO-username and password when prompted). Now make a new branch called feature/oblig and switch to it by executing:

git checkout -b feature/oblig

Add all you changes done on the following exercises to this branch.

Now, execute

make setup

This will create a folder structure you should use for the following exercises, download the programs you need, as well as download the source data from the city council of Urb. Note: Never commit anything from the data or program folders to you Git repo, as these are downloaded or made by the Makefile.

Note that you can always execute

make clean_all
make setup

to get a completely fresh start if you need it. This will not delete any files that you make, only files downloaded as part of the setup, and clean your database and derived data.

The Makefile contains all commands needed for the mandatory exercise, however each command uses files that you need to make, described in each exercise below.

Note: There is a list of known bugs in the KiWi triplestore on the KiWi page under the heading Known Bugs. It is wise to read through this list so that you do not waste time due to one of these. However, all the exercises can be solved without encountering these bugs.

You are now set to solve all of their problems!

Exercise 1: Direct Mappings

Make a file mappings/direct_mappings.sql that contains the SQL-commands for adding direct mappings for all the tables in the suburbia and urbania schemas. You should use the base URI http://leifhka.org/in5800/urb/raw_data (note that no slash is added at the end) as prefix for you mappings.

Test that your mappings work by executing

make direct_mappings

and execute

SELECT *
FROM triples_str;

This query should return a list of several thousand triples describing the citizens, buildings, homes and organizations in the two city parts.

If you make mistakes, you can clean you database and reapply your mappings by executing

make clean_dbs
make direct_mappings

Exercise 2: bOTTR Mappings

The database from Suburbia does not contain any data about buildings other than homes nor any data about organizations. However, the local administration of Suburbia has kept a list of important organizations and the addresses of the buildings they have offices in in a separate CSV file, located at data/organizations.csv. This data will be important for making the information about who works where explicit.

Thus, you must make a file mappings/organizations-map.ttl containing a bOTTR mapping and a corresponding template in a file templates/organizations.stottr in stOTTR format that is used in the mapping. The mapping should map all the data in the file data/organizations.csv into RDF such that e.g. the following lines in the CSV-file:

Abc, Street road 1, 1234
Def, Road street 2, 5678

would be mapped to the following triples:

@prefix urb: <http://leifhka.org/in5800/urb/> . 
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

urb:Abc rdf:type urb:Organization ;
    urb:name "Abc" ;
    urb:locatedAt [ rdf:type urb:Building ;
                    urb:address "Street road 1" ;
                    urb:zipcode "1234" ] .

urb:Def rdf:type urb:Organization ;
    urb:name "Def" ;
    urb:locatedAt [ rdf:type urb:Building ;
                    urb:address "Road street 2" ;
                    urb:zipcode "5678" ] .

You can add apply the bOTTR-mapping and load the resulting triples into the triplestore with make organizations.

Again, if you make mistakes, you can clean your database and reapply all of the mappings by executing

make clean_dbs
make all_data

Exercise 3: Integration via KiWi-OWL Semantics

The city council is very happy with having all of the data in the same triplestore. However, they realize that despite the data now having the same format (i.e. triples), it is still difficult to make use of it, as different terms and structures are used to represent the same information.

Thus, you realize that adding an ontology harmonizing the data under a common vocabulary would be beneficial. So make an ontology using the KiWi-OWL fragment of OWL (click the link for info on this fragment, the SKWRL-encoding of this semantics is already downloaded to semantics/kiwi-owl.skwrl) in the file semantics/urb_ontology.ttl with the following prefixes:

@prefix urb-sc: <http://leifhka.org/in5800/urb/raw_data/suburbia/citizen/> .
@prefix urb-uc: <http://leifhka.org/in5800/urb/raw_data/urbania/citizen/> .
@prefix urb-sh: <http://leifhka.org/in5800/urb/raw_data/suburbia/home/> .
@prefix urb-ub: <http://leifhka.org/in5800/urb/raw_data/urbania/building/> .
@prefix urb-uo: <http://leifhka.org/in5800/urb/raw_data/urbania/organization/> .
@prefix urb: <http://leifhka.org/in5800/urb/> . 

so that the following holds:

Note: You only need to specify domain and range of properties if this corresponds to a single class (e.g. the range of urb:worksAt should be urb:Organization, but the domain of urb:name is the union of urb:Person and urb:Organization which we cannot express directly in KiWi-OWL).

You can add your semantics to the triplestore with make urb_ontology.

If you make mistakes, as always, you can clean your database and reapply all of the mappings and the ontology by executing

make clean_dbs
make urb_ontology

Exercise 4: Integration via rules

The city council is impressed with your work, and see now that much of their data is integrated. However, one thing still remains: The citizens of Suburbia only have occupations in the form of a string (farmer, fisher, pupil or student).

After talking to the people in the council for a while, you get the following information:

Note that information about the Docks, Suburbia School, etc. was contained in the file data/organizations.csv that you mapped into the triplestore as part of exercise 2.

What remains for you to do is add semantics stating how the occupations should relate to the different organizations they work at. You quickly realize that this is not possible with KiWi-OWL. Thus, you should make SKWRL-rules to do this. Apply you reasoning with make urb_rules.

If you make mistakes, as always, you can clean your database and reapply all of the mappings and the semantics by executing

make clean_dbs
make triplestore

Exercise 5: People and Organizations via SPARQL

Now all of the data is integrated and fully usable. The city council of Urb now wants you to create lists of people and organizations within the city.

So, write a SPARQL-query in the file queries/people.sparql that for each person outputs the name, the Urb ID, the address and zipcode to the building they live in, the apartment number (if they have one), and the IRI of the organization they work for (if not unemployed). Order the result by the name of the person.

Also write a query in the file queries/organizations.sparql that for each organization in the city outputs the IRI, the name and the address and zipcode of their offices. Order the result by the zipcode and then address of the offices.

Execute your queries with

make query_people

and

make query_organizations

Exercise 6: Documentation

Before you finish everything and hand over all of your code to the city council of Urb, you realize that the Makefile in the project is completely undocumented.

So make a README.md file (using Markdown syntax) and write a small guide on how to use the Makefile. It should contain instructions on how to do setup (like you did in the beginning), build the triplestore with all the data and semantics, execute your two queries, and how to clean both the files and the databases.

Delivery

Before the deadline, add all your changes to the branch feature/oblig, make a pull request into the master-branch. Finally, add me (username leifhka) to you repo and upload a file to Devilry containing only the URL of your repo. I will correct it and make comments inline in you code on the pull request.

If you have any questions, please feel free to ask on any of the available channels for the course, or email me at leifhka [at] ifi.uio.no.

Good luck! :)