Solution to exercises on data structure

1. Different structures

a)

Triples:

@prefix ex-o: <http://example.org/ont/> .
@prefix ex: <http://example.org/data/> .

ex-o:Computer ex-o:hasPartsOfType ex-o:Part ;
    ex-o:individualsUniquelyIdentifiedBy ex-o:hasMAC ;
    ex-o:hasOneOf ex-o:hasLabel .

ex-o:Part ex-o:individualsUniquelyIdentifiedBy ex-o:hasRegNr ;
    ex-o:hasOneOf ex-o:label .
    
ex:computer a ex-o:Computer ;
    ex-o:hasMAC "12.34.56.78" ;
    ex-o:hasLabel "Thinkpad X123" ;
    ex-o:hasPart ex:motherboard, ex:ram .

ex:motherboard a ex-o:Part ;
    ex-o:hasRegNr "987.654.321" ;
    ex-o:hasLabel "Motherboard XYZ" .
    
ex:ram a ex-o:Part ;
    ex-o:hasRegNr "111.222.333" ;
    ex-o:hasLabel "RA; 8GB" .

Relational:

CREATE TABLE part(
    registration_number text PRIMARY KEY,
    label text NOT NULL,
    partOfComputer text REFERENCES computer(mac_address) 
);
CREATE TABLE computer(
    mac_address text PRIMARY KEY,
    label text NOT NULL
);

INERT INTO computer VALUES
('12.34.56.78', 'Thinkpad X123');

INERT INTO part VALUES
('987.654.321', 'Motherboard XYZ', '12.34.56.78'), 
('111.222.333', 'RAM 8GB', '12.34.56.78');

Notes

In making the triplestore, one needs to invent/find a vocabulary that fits with the domain. Furthermore, the semantics of this vocabulary needs to be documented and/or implemented by some program for it to be useful.

For the relational schema, we can simply use constraints to express much of the information, and there is a fairly straight forward way to encode all of the information.

b)

Triples:

Add triples:

ex-o:Part ex-o:hasPartsOfType ex-o:Part .

ex:motherboard ex-o:hasPart ex:cpu, ex:gpu .

ex:cpu a ex-o:Part ;
    ex-o:harRegNr "192.837.465" ;
    ex-o:hasLabel "CPU 5.2 GHz" .

ex:gpu a ex-o:Part ;
    ex-o:harRegNr "999.888.777" ;
    ex-o:hasLabel "GPU 4 GHz" .

ex-o:Cluster ex-o:hasPartsOfType ex-o:Computer ;
    ex-o:individualsUniquelyIdentifiedBy ex-o:hasClusterID ;
    ex-o:hasOneOf ex-o:label .
    
ex:cluster a ex-o:Cluster ;
    ex-o:hasClusterID "1A" ;
    ex-o:hasLabel "main" ;
    ex-o:hasPart ex:computer .

Relational:

ALTER TABLE part
ADD COLUMN partOfPart text REFERENCES part(registration_number);

INSERT INTO part VALUES
('192.837.465', 'CPU 5.2 GHz', NULL, '987.654.321'),
('999.888.777', 'GPU 4 GHz', NULL, '987.654.321');

CREATE TABLE cluster(
    clusterID text PRIMARY KEY,
    label text NOT NULL
);

ALTER TABLE computer
ADD COLUMN partOfCluster text REFERENCES cluster(clusterID);

INSERT INTO cluster VALUES
('1A', 'main');

UPDATE computer
SET partOfCluster = '1A'
WHERE mac_address = '12.34.56.78';

Notes

It seems easier in this case to extend the relational schema, as there really is only two simple commands. Assuming a part that is part of another part is not also part of a computer (directly), we will get a lot of NULL-values in the part-table. Also, queries over this data may need to handle these NULL-values in some way, thus complicating the data access.

c)

Triples:

Add triples:

ex-o:Component ex-o:hasPartsOfType ex-o:Component ;
    ex-o:individualsUniquelyIdentifiedBy ex-o:hasID ;
    ex-o:hasOneOf ex-o:label .
    
ex-o:Cluster ex-o:subClassOf ex-o:Component .
ex-o:Computer ex-o:subClassOf ex-o:Component .
ex-o:Part ex-o:subClassOf ex-o:Component .

If one rather made a separate property for each type of has-part-relationship in a) and b), i.e. ex-o:computerHasPart and ex-o:partHasPart, then one either delete these triples and add ex-o:hasPart-triples, or one can simply add the following triples:

ex-o:computerHasPart ex-o:supPropertyOf ex-o:hasPart .
ex-o:partHasPart ex-o:supPropertyOf ex-o:hasPart .

Relational:

Either, using making the new terms views:

CREATE VIEW component(id, label, ctype) AS
SELECT clusterID, label, 'cluster' FROM cluster
UNION ALL
SELECT mac_address, label, 'computer' FROM computer
UNION ALL
SELECT regNr, label, 'part' FROM part;

CREATE VIEW partOf(part, whole) AS
SELECT mac_address, partOfCluster
FROM computer
UNION ALL
SELECT regNr, partOfComputer
FROM computer
WHERE partOfComputer IS NOT NULL
UNION ALL
SELECT regNr, partOfPart
FROM computer
WHERE partOfPart IS NOT NULL;

or, by making the old terms views:

CREATE TABLE component(
    id text PRIMARY KEY,
    label text NOT NULL,
    ctype text NOT NULL
);

INSERT INTO component
SELECT clusterID, label, 'cluster' FROM cluster
UNION ALL
SELECT mac_address, label, 'computer' FROM computer
UNION ALL
SELECT regNr, label, 'part' FROM part;

CREATE TABLE partOf(
    part text REFERENCES component(id),
    whole text REFERENCES component(id)
);

INSERT INTO partOf
SELECT regNr, partOfComputer
FROM computer
WHERE partOfComputer IS NOT NULL
UNION ALL
SELECT regNr, partOfPart
FROM computer
WHERE partOfPart IS NOT NULL;

DROP TABLE part;
DROP TABLE computer;
DROP TABLE cluster:

CREATE VIEW part(regNr, label, partOfComputer, partOfPart) AS
SELECT c.id, c.label, p.whole, NULL
FROM component AS c JOIN partof AS p ON (c.id = p.part)
WHERE c.ctype = 'part' AND p.whole IN (SELECT id FROM component WHERE ctype = 'computer')
UNION ALL
SELECT c.id, c.label, NULL, p.whole
FROM component AS c JOIN partof AS p ON (c.id = p.part)
WHERE c.ctype = 'part' AND p.whole IN (SELECT id FROM component WHERE ctype = 'part')

CREATE VIEW computer(mac_address, label, partOfCluster) AS
SELECT c.id, c.label, p.whole
FROM component AS c JOIN partOf AS p ON (c.id = p.part)
WHERE c.ctype = 'computer';

CREATE VIEW cluster(clusterId, label) AS
SELECT id, label
FROM component
WHERE ctype = 'cluster';

Notes

The triples are, compared to the relational schema, easier to extend. If other types of components (e.g. servers, data centers, etc.) were to be added to the hierarchy, the triplestore only needs a few added triples.

Both of the relational approaches require more substantial updates (the first would need to drop and recreate updated views if new types of components are to be added, and the second is easier to maintain in light of new types of components, but is a rather large rewrite of the original schema).

However, the relational schema is easier to utilize and verify, as the meta data is expressed as constraints and are therefore automatically checked by any RDBMS. For the triples, the semantics of the vocabulary either needs to be implemented directly, or defined in terms of an existing vocabulary with implemented reasoners. Reasoners can typically check for consistency, but often cannot do more complex forms of constraint checking.

The point of this exercise was really to get a feel for different data structures, and the properties they have. We will learn more about semantics, reasoning, etc. later in the course!

2. Implicit data

ex:carl ex:hasGivenName "Carl" ; # The individual's given/first name is "Carl"
  ex:hasSureName "Smith" ; # The individual's sure/last name is "Carl"
  ex:livesAtStreet ex:streetroad ; # The individual lives at a street named "Streetroad"
  ex:livesInCity ex:oslo ; # The individual lives in a city named "Oslo"
  rdf:type ex:Person ; # The individual denotes a person
  ex:hasSameSureNameAs ex:mary ; # The person has the same sure name as ex:mary
  ex:hasDifferentFirstNameAs ex:mary ; # The person has a different given name as ex:mary
  ex:livesInSameCityWith ex:mary ; # The person lives in the same city as ex:mary
  ex:isDifferentFrom ex:mary ; # The person is not the same person as ex:mary (assuming a person can only live in one house or have only one name)
  rdf:type ex:NotHomeless, ex:NamedPerson . # The person is not homeless and has a name
  
# Can express many of the same or similar statements for ex:mary as above.
# However, can also start describing e.g. the addresses:

[] rdf:type ex:Address ;
  ex:address "Streetroad 1, 1234, Oslo" ;
  ex:inZipCode "1234" ;
  ex:onStreet ex:Streetroad ;
  ex:inCity ex:oslo .
  
# And then describe streets:

ex:Streetroad rdf:type ex:Street ;
  ex:name "Streetroad" ;
  ex:inCity ex:oslo .

# And then cities:

ex:oslo rdf:type ex:City ;
  ex:name "Oslo" ;
  ex:hasCitizens ex:carl, ex:mary .

# And so on...

3. Meaning from structure in the relational model

ping is an entity type identified by a bip, with an attribute pip that is an integer. A ping can be kip-related to (at most one) pong.
pong is an entity type identified by a bop with an integer attribute pop.
pings and pongs can be bang-related, where the relationship has a textual kap-attribute.

4. Meaning from structure in RDF

We know that ex:pong is a property. Furthermore, ex:pong is not irreflexive (since ex:pong relates ex:peng to itself), thus cannot e.g. be a “different from”, “strictly less than”, “mother of” or similar properties. ex:pong can also relate other properties. ex:pang is also a property.
If ex:pong relates only properties, this means that ex:peng is also a property, and thus ex:pang can relate things to properties (as ex:ping is ex:pong-related to ex:peng).
If ex:pong denotes equality, then so does ex:pang, as ex:pong is ex:pong-related to ex:pang. This then implies that ex:ping equals ex:peng.
If ex:pong denotes superproperty, then ex:peng is a subproperty of the superpropery-predicate, thus at least also denotes superproperty. This then implies that ex:ping is a superproperty of ex:peng.