Lecture 2: Data Structure
This lecture gives an overview of different structure data can take, the different qualities data can have, and how they are related.
Relevant Wiki pages
- PostgeSQL (relational DB and backend for triplestore)
- RDF (Triple-based data representation)
- Lore Declarative extension of PostgreSQL
- Triplelore Triplestore (RDF DB based on Lore)
Exercises
1. Different structures
a)
You work in a data center with lots of computers. You need to keep track of which computers you have, and their parts, for management and maintenance. You get the following statements from the technicians that installed the computers:
- A computer consists of parts
- Every concrete computer has a unique MAC address that identifies it, and a label
- Every concrete part in a computer has a unique registration number, and a label
- Computer with MAC-address
12.34.56.78
and labelThinkpad X123
, has two parts:- A part with registration number
987.654.321
and labelMotherboard XYZ
- A part with registration number
111.222.333
and labelRAM 8GB
- A part with registration number
Unfortunately, you were not able do decide on type of database, and decided to make both a triplestore and a relational database for storing the information, with the aim of dropping one of them when it becomes clear which system fits best with the task.
So, make a set of triples and a relational database (e.g. with SQL-statements) containing the information above. For the triples, you do not need to use any ontology language (such as OWL), simply make resources as you see fit, with intuitive names.
b)
The technicians did not give you the full picture in the beginning, and sent you the following updated statements via email:
- Parts may be part of other parts (instead of computers directly)
- Part with registration number
987.654.321
has two parts:- A part with registration number
192.837.465
and labelCPU 5.2 GHz
- A part with registration number
999.888.777
and labelGPU 4 GHz
- A part with registration number
Also, as the data center grows, you now see the need to divide the computers into separate clusters, both for resource management, but also for maintenance.
These are the updates you get from the technicians once they are done with the clustering:
- Computers are parts of clusters
- Clusters are uniquely identified by a cluster-ID, and also contains a label
- Cluster with cluster-ID
1A
, and labelmain
, has as part the computer with MAC address12.34.56.78
Update your databases to also contain the information given above.
For SQL, write SQL-commands that acts on the previously made schema
(e.g. ALTER TABLE
-commands), and for RDF simply write “Add
triples” or “Remove triples”.
c)
After some time of operation, you realize that in certain situations, it would be useful to view the data center as simply a collection of components, arranged in a part-of-hierarchy (for example: if a part in a part is broken, the whole part is broken; if a part in a computer is broken, the whole computer is broken; and if a computer in a cluster is broken, the whole cluster is broken). You therefore see the need for having a single notion of component, and a single part-of-relation that contains the full part-of-hierarchy. In such cases, it would also be useful to be able to handle all the different types of objects as a single thing. However, you have already implemented lots of programs using the original databases, so no breaking changes can be made to the original schemas (but no data should be duplicated either).
Thus, your database needs to adhere to the following statements as well:
- All objects (i.e. clusters, computers and parts) should be gathered into a single notion called “component”, each with its unique ID and label
- The “part of” relation between computers and clusters, parts and computers, and between parts should all be the same relation
Now further extend/change your data and meta-data made above to also include this information. You can assume that cluster-IDs, MAC-addresses, and registrations numbers are all just text. Also, they all have a distinct form so e.g. a MAC-address will never equal a cluster-ID nor equal a registration number.
2. Implicit data
Given the following triples:
ex:carl ex:hasName "Carl Smith";
ex:livesAt "Streetroad 1, 1234, Oslo" .
ex:mary ex:hasName "Mary Smith";
ex:livesAt "Streetalley 2, 2345, Oslo" .
where ex:hasName
relates people to their names and
ex:livesAt
relates people to the address of the house they
live in.
Write down at least 10 triples implicit in this data. You can invent
new resources, such as ex:hasSameAddress
or
ex:Person
, but give them a natural language definition.
3. Meaning from structure in the relational model
Given the following relational database schema (defined in SQL):
CREATE TABLE ping(
int PRIMARY KEY,
bip int,
pip int REFERENCES pong(bop)
kip
);
CREATE TABLE pong(
int PRIMARY KEY,
bop int
pop
);
CREATE TABLE bang(
int REFERENCES ping(bip),
bap int REFERENCES pong(bop),
pap,
kap text,PRIMARY KEY (bap, pap)
);
What can you say about ping
, bip
,
pip
, kip
, pong
, bop
,
pop
, bang
, bap
, pap
,
and kap
? What are they?
4. Meaning from structure in RDF
Given the following triples:
ex:ping ex:pang ex:peng .
ex:pong ex:pong ex:pang .
ex:peng ex:pong ex:peng .
- What can you say about
ex:ping
,ex:pang
,ex:peng
, andex:pong
? - If you are given the information that
ex:pong
is a property that only relates properties to other properties, what can you then say about the other resources? - If you are given the information that
ex:pong
denotes equality, what can you now say about the resources in the above graph? - Now assume you are given the information that
ex:pong
denotes (reflexive) superproperty (i.e. the inverse of subproperty) (e.g.ex:knows
is a superproperty ofex:hasFriend
, andex:inRelationshipWith
is a superproperty ofex:isMarriedTo
). What can you now say about the resources in the above graph?
Solution
The solution to these exercises can be found here.