The city of Urb has gathered quite a lot of data about its citizens and organizations. However, the city has two city parts, Suburbia and Urbania. Urbania is the city’s center with tall buildings, busy streets and lots of people. Suburbia is the suburban part of the city, with farms, parks and villa houses. The two parts are quite different, leading to them having rather different data needs and therefore also different structure and content in their databases.
For city planning and other forms of city management, the city council of Urb now realizes that they need to integrate their data into a common database. They have managed to get most of their data into the same PostgreSQL database. Suburbia’s data is in the schema with name
suburbia, and Urbania’s data is in the schema with name
urbania. However, as noted above, the structure and content of these two schemas are too different to be merged directly. Thus, a more complex data integration job is needed and you are hired to do this task.
The City of Urb’s Data
Suburbia has a database with the following tables (described via the SQL-commands creating the schema):
CREATE TABLE suburbia.home( int PRIMARY KEY, hid NOT NULL, home_address text NOT NULL zip_code text ); CREATE TABLE suburbia.citizen( int PRIMARY KEY, cid UNIQUE, urb_id text NOT NULL, name text int REFERENCES suburbia.home(hid), home -- farmer, fisher, student, pupil occupation text );
suburbia.home contains addresses and zip codes for all houses in the city part, and
suburbia.citizen contains each person’s Urb ID (an identification string given uniquely to each citizen of the entire city of Urb) each person’s name, a reference to the house that is each person’s home, a string containing the each person’s occupation (only farmers, fishers, students and pupils live in Suburbia). If the occupation column is empty (i.e.
NULL), this means that the person is unemployed.
Urbania has a somewhat more complex schema, with the following tables (again, described via the SQL-commands creating the schema):
CREATE TABLE urbania.building( int PRIMARY KEY, bid NOT NULL, address text NOT NULL zipcode text ); CREATE TABLE urbania.organization( oid int PRIMARY KEY, NOT NULL, name text int REFERENCES urbania.building(bid) offices_in ); CREATE TABLE urbania.citizen( int PRIMARY KEY, cid UNIQUE, urb_id text NOT NULL, name text int NOT NULL, apartment_number int REFERENCES urbania.building(bid), apartment_building int REFERENCES urbania.organization(oid) works_at );
urbania.building contains addresses and zip codes for all buildings in the city part;
urbania.organization contains the name of all organizations (companies, school, university, etc.) and a reference to the building that the organization has offices in; and finally
urbania.citizen contains each person’s Urb ID (an identification string given uniquely to each citizen of the entire city of Urb) each person’s name, a reference to the apartment building that each person lives in together with the apartment number, and a reference to the organization that the person is working at. If
works_at is empty (i.e.
NULL), then the person is unemployed. Note that in this schema, a student
works_at the university, and a pupil
works_at the school.
Make a new private Git-repo at UiO’s Github, and check it out to either your local computer (note: you need a working PostgreSQL server running and Java 11 on you local machine to solve the oblig) or at an IFI Linux-machine (you can also access the IFI-machines remotely, see this page (Norwegian) or this page (English) for remote access).
Download this Makefile, and store it in the repo you just made. If you are logged inn to a remote IFI-machine, you can download the file by executing
Open the Makefile in your favorite text editor and update the variables
db_password with the name of your PostgreSQL database, and the username and password for this database. If you are on an IFI-machine (either physically or remote) comment out line 8 (with a
# at the start) and inn (by removing the first
#) line 9 to set the correct Java version. Use
git add Makefile
to add the file to the repo, and finally
git commit -m "Init repo." git push origin master
to commit and push the changes (give your UiO-username and password when prompted). Now make a new branch called
feature/oblig and switch to it by executing:
git checkout -b feature/oblig
Add all you changes done on the following exercises to this branch.
This will create a folder structure you should use for the following exercises, download the programs you need, as well as download the source data from the city council of Urb. Note: Never commit anything from the
program folders to you Git repo, as these are downloaded or made by the Makefile.
Note that you can always execute
make clean_all make setup
to get a completely fresh start if you need it. This will not delete any files that you make, only files downloaded as part of the setup, and clean your database and derived data.
The Makefile contains all commands needed for the mandatory exercise, however each command uses files that you need to make, described in each exercise below.
Note: There is a list of known bugs in the KiWi triplestore on the KiWi page under the heading Known Bugs. It is wise to read through this list so that you do not waste time due to one of these. However, all the exercises can be solved without encountering these bugs.
You are now set to solve all of their problems!
Exercise 1: Direct Mappings
Make a file
mappings/direct_mappings.sql that contains the SQL-commands for adding direct mappings for all the tables in the
urbania schemas. You should use the base URI
http://leifhka.org/in5800/urb/raw_data (note that no slash is added at the end) as prefix for you mappings.
Test that your mappings work by executing
SELECT * FROM triples_str;
This query should return a list of several thousand triples describing the citizens, buildings, homes and organizations in the two city parts.
If you make mistakes, you can clean you database and reapply your mappings by executing
make clean_dbs make direct_mappings
Exercise 2: bOTTR Mappings
The database from Suburbia does not contain any data about buildings other than homes nor any data about organizations. However, the local administration of Suburbia has kept a list of important organizations and the addresses of the buildings they have offices in in a separate CSV file, located at
data/organizations.csv. This data will be important for making the information about who works where explicit.
Thus, you must make a file
mappings/organizations-map.ttl containing a bOTTR mapping and a corresponding template in a file
templates/organizations.stottr in stOTTR format that is used in the mapping. The mapping should map all the data in the file
data/organizations.csv into RDF such that e.g. the following lines in the CSV-file:
Abc, Street road 1, 1234 Def, Road street 2, 5678
would be mapped to the following triples:
@prefix urb: <http://leifhka.org/in5800/urb/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . urb:Abc rdf:type urb:Organization ; urb:name "Abc" ; urb:locatedAt [ rdf:type urb:Building ; urb:address "Street road 1" ; urb:zipcode "1234" ] . urb:Def rdf:type urb:Organization ; urb:name "Def" ; urb:locatedAt [ rdf:type urb:Building ; urb:address "Road street 2" ; urb:zipcode "5678" ] .
You can add apply the bOTTR-mapping and load the resulting triples into the triplestore with
Again, if you make mistakes, you can clean your database and reapply all of the mappings by executing
make clean_dbs make all_data
Exercise 3: Integration via KiWi-OWL Semantics
The city council is very happy with having all of the data in the same triplestore. However, they realize that despite the data now having the same format (i.e. triples), it is still difficult to make use of it, as different terms and structures are used to represent the same information.
Thus, you realize that adding an ontology harmonizing the data under a common vocabulary would be beneficial. So make an ontology using the KiWi-OWL fragment of OWL (click the link for info on this fragment, the SKWRL-encoding of this semantics is already downloaded to
semantics/kiwi-owl.skwrl) in the file
semantics/urb_ontology.ttl with the following prefixes:
@prefix urb-sc: <http://leifhka.org/in5800/urb/raw_data/suburbia/citizen/> . @prefix urb-uc: <http://leifhka.org/in5800/urb/raw_data/urbania/citizen/> . @prefix urb-sh: <http://leifhka.org/in5800/urb/raw_data/suburbia/home/> . @prefix urb-ub: <http://leifhka.org/in5800/urb/raw_data/urbania/building/> . @prefix urb-uo: <http://leifhka.org/in5800/urb/raw_data/urbania/organization/> . @prefix urb: <http://leifhka.org/in5800/urb/> .
so that the following holds:
urb:Personis the class of all Urb citizens
urb:Buildingis the class of all buildings in Urb (both homes and other buildings)
urb:Organizationis the class of all organizations in Urb, i.e. the place people work
urb:namerelates people and organizations to their names
urb:locatedAtrelates people to the building they live in and organizations (from Urbania) to the building they have offices in (Note: we will fix work locations for Suburbian citizens in the next exercise)
urb:containsshould be the inverse of
urb:addressshould relate buildings (individuals of
urb:Building) to their addresses
urb:zipcodeshould relate buildings (individuals of
urb:Building) to their zip codes
urb:worksAtrelates the people of Urbania to the organizations they work at (note that students and pupils are said to “work at” their universities and schools) (Note: we will fix where people work for Suburbian citizens in the next exercise)
urb:hasEmployeeshould be the inverse of
urb:urbIdshould relate every person (individual of
urb:Person) to their Urb ID.
Note: You only need to specify domain and range of properties if this corresponds to a single class (e.g. the range of
urb:worksAt should be
urb:Organization, but the domain of
urb:name is the union of
urb:Organization which we cannot express directly in KiWi-OWL).
You can add your semantics to the triplestore with
If you make mistakes, as always, you can clean your database and reapply all of the mappings and the ontology by executing
make clean_dbs make urb_ontology
Exercise 4: Integration via rules
The city council is impressed with your work, and see now that much of their data is integrated. However, one thing still remains: The citizens of Suburbia only have occupations in the form of a string (
After talking to the people in the council for a while, you get the following information:
- Farmers are self-employed, and thus should also be organizations and then
urb:worksAt-related to themselves (this also means that the place they work is their own home, i.e. the farm they live on)
- Fishers work at the docks
- Students go to the university in Urbania (i.e. the organization with name “University”)
- Pupils go to school at the Suburbian School
Note that information about the Docks, Suburbia School, etc. was contained in the file
data/organizations.csv that you mapped into the triplestore as part of exercise 2.
What remains for you to do is add semantics stating how the occupations should relate to the different organizations they work at. You quickly realize that this is not possible with KiWi-OWL. Thus, you should make SKWRL-rules to do this. Apply you reasoning with
If you make mistakes, as always, you can clean your database and reapply all of the mappings and the semantics by executing
make clean_dbs make triplestore
Exercise 5: People and Organizations via SPARQL
Now all of the data is integrated and fully usable. The city council of Urb now wants you to create lists of people and organizations within the city.
So, write a SPARQL-query in the file
queries/people.sparql that for each person outputs the name, the Urb ID, the address and zipcode to the building they live in, the apartment number (if they have one), and the IRI of the organization they work for (if not unemployed). Order the result by the name of the person.
Also write a query in the file
queries/organizations.sparql that for each organization in the city outputs the IRI, the name and the address and zipcode of their offices. Order the result by the zipcode and then address of the offices.
Execute your queries with
Exercise 6: Documentation
Before you finish everything and hand over all of your code to the city council of Urb, you realize that the Makefile in the project is completely undocumented.
So make a
README.md file (using Markdown syntax) and write a small guide on how to use the Makefile. It should contain instructions on how to do setup (like you did in the beginning), build the triplestore with all the data and semantics, execute your two queries, and how to clean both the files and the databases.
Before the deadline, add all your changes to the branch
feature/oblig, make a pull request into the
master-branch. Finally, add me (username
leifhka) to you repo and upload a file to Devilry containing only the URL of your repo. I will correct it and make comments inline in you code on the pull request.
If you have any questions, please feel free to ask on any of the available channels for the course, or email me at
leifhka [at] ifi.uio.no.