The city of Urb has gathered quite a lot of data about its citizens and organizations. However, the city has two city parts, Suburbia and Urbania. Urbania is the city’s center with tall buildings, busy streets and lots of people. Suburbia is the suburban part of the city, with farms, parks and villa houses. The two parts are quite different, leading to them having rather different data needs and therefore also different structure and content in their databases.
For city planning and other forms of city management, the city
council of Urb now realizes that they need to integrate their data into
a common database. They have managed to get most of their data into the
same PostgreSQL database. Suburbia’s data is in the schema with name
suburbia, and Urbania’s data is in the schema with name
urbania. However, as noted above, the structure and content
of these two schemas are too different to be merged directly. Thus, a
more complex data integration job is needed and you are hired to do this
The City of Urb’s Data
Suburbia has a database with the following tables (described via the SQL-commands creating the schema):
CREATE TABLE suburbia.home( int PRIMARY KEY, hid NOT NULL, home_address text NOT NULL zip_code text ); CREATE TABLE suburbia.citizen( int PRIMARY KEY, cid UNIQUE, urb_id text NOT NULL, name text int REFERENCES suburbia.home(hid), home -- farmer, fisher, student, pupil occupation text );
suburbia.home contains addresses and zip codes for
all houses in the city part, and
each person’s Urb ID (an identification string given uniquely to each
citizen of the entire city of Urb) each person’s name, a reference to
the house that is each person’s home, a string containing the each
person’s occupation (only farmers, fishers, students and pupils live in
Suburbia). If the occupation column is empty (i.e.
this means that the person is unemployed.
Urbania has a somewhat more complex schema, with the following tables (again, described via the SQL-commands creating the schema):
CREATE TABLE urbania.building( int PRIMARY KEY, bid NOT NULL, address text NOT NULL zipcode text ); CREATE TABLE urbania.organization( oid int PRIMARY KEY, NOT NULL, name text int REFERENCES urbania.building(bid) offices_in ); CREATE TABLE urbania.citizen( int PRIMARY KEY, cid UNIQUE, urb_id text NOT NULL, name text int NOT NULL, apartment_number int REFERENCES urbania.building(bid), apartment_building int REFERENCES urbania.organization(oid) works_at );
urbania.building contains addresses and zip codes for
all buildings in the city part;
contains the name of all organizations (companies, school, university,
etc.) and a reference to the building that the organization has offices
in; and finally
urbania.citizen contains each person’s Urb
ID (an identification string given uniquely to each citizen of the
entire city of Urb) each person’s name, a reference to the apartment
building that each person lives in together with the apartment number,
and a reference to the organization that the person is working at. If
works_at is empty (i.e.
NULL), then the person
is unemployed. Note that in this schema, a student
the university, and a pupil
works_at the school.
Make a new private Git-repo at UiO’s Github, and check it out to either your local computer (note: you need a working PostgreSQL server running and Java 11 on you local machine to solve the oblig) or at an IFI Linux-machine (you can also access the IFI-machines remotely, see this page (Norwegian) or this page (English) for remote access).
Download this Makefile, and store it in the repo you just made. If you are logged inn to a remote IFI-machine, you can download the file by executing
Open the Makefile in your favorite text editor and update the
db_host with the name of your
PostgreSQL database, and the username, password and host for this
git add Makefile
to add the file to the repo, and finally
git commit -m "Init repo." git push origin master
to commit and push the changes (give your UiO-username and password
when prompted). Now make a new branch called
and switch to it by executing:
git checkout -b feature/oblig
Add all you changes done on the following exercises to this branch.
This will create a folder structure you should use for the following
exercises, download the programs you need, as well as download the
source data from the city council of Urb. Note: Never commit anything
program folders to you Git
repo, as these are downloaded or made by the Makefile.
Note that you can always execute
make clean_all make setup
to get a completely fresh start if you need it. This will not delete any files that you make, only files downloaded as part of the setup, and clean your database and derived data.
The Makefile contains all commands needed for the mandatory exercise, however each command uses files that you need to make, described in each exercise below. Apart from the veriables for the database connection you edited above, you do not need to add anything to the Makefile. However, it is wise to study it and get familiar with what the each command does, as you will need to make a similar Makefile for your project work later in the semester.
You are now set to solve all of Urbania’s problems!
Exercise 1: Direct Mappings
Make a file
mappings/direct_mappings.lore that contains
the Lore-commands for adding direct mappings for all the tables in the
urbania schemas. You should use
raws for the mappings in
rawu for the mappings in
urbania-schema. Make your mappings forward-chaining. (Note:
You do not need to map the sequence
home_hid_seq, this is
only used by PostgreSQL to generate new keys for
Now apply your mappings with:
You can test that your mappings work by executing:
This will count the number of triples in the triplestore and should
8850. (Note: You can change the query in
queries/test.sparql to have the above command execute that
If you make mistakes, you can clean you database and reapply your mappings by executing
make clean_dbs make direct_mappings
Exercise 2: Integration via OWL Semantics
The city council is very happy with having all of the data in the same triplestore. However, they realize that despite the data now having the same format (i.e. RDF), it is still difficult to make use of it, as different terms and structures are used to represent the same information.
Thus, you realize that adding an ontology harmonizing the data under
a common vocabulary would be beneficial. So make an ontology using the
Lore-OWL fragment of OWL
(click the link for info on this fragment) in the file
semantics/urb_ontology.ttl with the following prefixes:
@prefix raws: <http://leifhka.org/in5800/urb/raw_data/suburbia/> . @prefix rawu: <http://leifhka.org/in5800/urb/raw_data/urbania/> . @prefix urb: <http://leifhka.org/in5800/urb/> .
so that the following holds:
urb:Personis the class of all Urb citizens
urb:Buildingis the class of all buildings in Urb (both homes and other buildings)
urb:Organizationis the class of all organizations in Urb, i.e. the place people work
urb:namerelates people and organizations to their names
urb:locatedAtrelates people to the building they live in and organizations (from Urbania) to the building they have offices in (Note: we will fix work locations for Suburbian citizens in the next exercise)
urb:containsshould be the inverse of
urb:addressshould relate buildings (individuals of
urb:Building) to their addresses
urb:zipcodeshould relate buildings (individuals of
urb:Building) to their zip codes
urb:worksAtrelates the people of Urbania to the organizations they work at (note that students and pupils are said to “work at” their universities and schools) (Note: we will fix where people work for Suburbian citizens in the next exercise)
urb:hasEmployeeshould be the inverse of
urb:urbIdshould relate every person (individual of
urb:Person) to their Urb ID.
Note: You only need to specify domain of properties if this
corresponds to a single class (e.g. the range of
urb:worksAt should be
the domain of
urb:name is the union of
urb:Organization which we
cannot express directly in Lore-OWL). Note that you need to give all
classes the type
owl:Class, all object properties the type
owl:ObjectProperty and all datatype properties the type
owl:DatatypeProperty and a range that is an
You can add your semantics to the triplestore with
If you make mistakes, as always, you can clean your database and reapply all of the mappings and the ontology by executing
make clean_dbs make direct_mappings make urb_ontology
Exercise 3: bOTTR Mappings
The database from Suburbia does not contain any data about buildings
other than homes nor any data about organizations. However, the local
administration of Suburbia has kept a list of important organizations
and the addresses of the buildings they have offices in in a separate
CSV file, located at
data/organizations.csv. This data will
be important for making the information about who works where
Thus, you must make a file
mappings/organizations-map.ttl containing a bOTTR mapping
and a corresponding template in a file
templates/organizations.stottr in stOTTR format that is
used in the mapping. The mapping should map all the data in the file
data/organizations.csv into RDF using the vocabulary from
the previous exercise such that e.g. the following lines in the
Abc, Street road 1, 1234 Def, Road street 2, 5678
would be mapped to the following triples:
@prefix urb: <http://leifhka.org/in5800/urb/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . urb:Abc rdf:type urb:Organization ; urb:name "Abc" ; urb:locatedAt [ rdf:type urb:Building ; urb:address "Street road 1" ; urb:zipcode "1234" ] . urb:Def rdf:type urb:Organization ; urb:name "Def" ; urb:locatedAt [ rdf:type urb:Building ; urb:address "Road street 2" ; urb:zipcode "5678" ] .
You can add apply the bOTTR-mapping and load the resulting triples
into the triplestore with
Again, if you make mistakes, you can clean your database and reapply all of the mappings by executing
make clean_dbs make direct_mappings urb_ontology organizations
Exercise 4: Integration via rules
The city council is impressed with your work, and see now that much
of their data is integrated. However, one thing still remains: The
citizens of Suburbia only have occupations in the form of a string
After talking to the people in the council for a while, you get the following information:
- Farmers are self-employed, and thus should also be organizations and
urb:worksAt-related to themselves (this also means that the place they work is their own home, i.e. the farm they live on)
- Fishers work at the docks
- Students go to the university in Urbania (i.e. the organization with name “University”)
- Pupils go to school at the Suburbian School
Note that information about the Docks, Suburbia School, etc. was
contained in the file
data/organizations.csv that you
mapped into the triplestore as part of exercise 2.
What remains for you to do is add semantics stating how the
occupations should relate to the different organizations they work at.
You quickly realize that this is not possible with Lore-OWL. Thus, you
should make Lore-rules to do this. Put the rules in
semantics/urb_rules.lore. Apply you reasoning with
If you make mistakes, as always, you can clean your database and reapply all of the mappings, the semantics and rules by executing
make clean_dbs make triplestore
Exercise 5: People and Organizations via SPARQL
Now all of the data is integrated and fully usable. The city council of Urb now wants you to create lists of people and organizations within the city.
So, write a SPARQL-query in the file
queries/people.sparql that for each person outputs the
name, the Urb ID, the address and zipcode to the building they live in,
the apartment number (if they have one), and the IRI of the organization
they work for (if not unemployed). Order the result by the name of the
Also write a query in the file
queries/organizations.sparql that for each organization in
the city outputs the IRI, the name and the address and zipcode of their
offices, as well as the number of people employed by the organization.
Order the result by number of employees.
Execute your queries with
Exercise 6: Sectors
The city council thinks your query for finding information about organizations is useful, but see that it is difficult to get an overview for analytical purposes. They therefore want you to introduce sectors (as classes) into your data that groups people into the sector they work in:
utb:UnemployedSectoris the class of all people that do not work.
urb:PublicSectoris the class of people working in public organizations, i.e. the Hospital, Library, Docks, Theater and University.
urb:SelfEmployedSectoris the class of all people that work for themselves (i.e.
?p ex:worksAt ?p).
urb:PrivateSectoris the class of all other working people.
Note that you need to define these IRIs as
in a separate ontology
semantics/sectors.ttl, and then make
Lore rules for the actual semantics in
semantics/sectors.lore. Make the definitions virtual,
i.e. use backward rules.
You can load your sector-semantics with
Hint: It is possible to use
NOT IN in the
WHERE-part of Lore-rules,
R(x, y) <- P(x, y) : y NOT IN (SELECT z FROM T).
Finally, make a SPARQL query that finds the number of unemployed
people living in each zipcode
Order the result by number of unemployed people.
You can execute this query with
If you make mistakes, as always, you can clean your database and reapply all of the mappings, the semantics and rules by executing
make clean_dbs make pipeline
Exercise 7: Documentation
Before you finish everything and hand over all of your code to the city council of Urb, you realize that the Makefile in the project is completely undocumented.
So make a
README.md file (using Markdown syntax) and
write a small guide on how to use the Makefile for the city council of
Urb. It should contain instructions on how to do setup (like you did in
the beginning), build the triplestore with all the data and semantics,
execute your two queries, and how to clean both the files and the
databases. Explain what the effect of each command is in non-techincal
language, e.g. “This command loads all data about people into the
Before the deadline, add all your changes to the branch
feature/oblig, make a pull request into the
master-branch (however, do not merge the
pull request!). Finally, add me (username
leifhka) to you
repo and upload a file to Devilry containing only the URL of your repo.
I will correct it and make comments inline in you code on the pull
If you have any questions, please feel free to ask on any of the
available channels for the course, or email me at
leifhka [at] ifi.uio.no.