Solution to exercises on data structure
1. Different structures
a)
Triples:
@prefix ex-o: <http://example.org/ont/> .
@prefix ex: <http://example.org/data/> .
ex-o:Computer ex-o:hasPartsOfType ex-o:Part ;
ex-o:individualsUniquelyIdentifiedBy ex-o:hasMAC ;
ex-o:hasOneOf ex-o:hasLabel .
ex-o:Part ex-o:individualsUniquelyIdentifiedBy ex-o:hasRegNr ;
ex-o:hasOneOf ex-o:label .
ex:computer a ex-o:Computer ;
ex-o:hasMAC "12.34.56.78" ;
ex-o:hasLabel "Thinkpad X123" ;
ex-o:hasPart ex:motherboard, ex:ram .
ex:motherboard a ex-o:Part ;
ex-o:hasRegNr "987.654.321" ;
ex-o:hasLabel "Motherboard XYZ" .
ex:ram a ex-o:Part ;
ex-o:hasRegNr "111.222.333" ;
ex-o:hasLabel "RA; 8GB" .
Relational:
CREATE TABLE part(
PRIMARY KEY,
registration_number text label text NOT NULL,
REFERENCES computer(mac_address)
partOfComputer text
);CREATE TABLE computer(
PRIMARY KEY,
mac_address text label text NOT NULL
);
INTO computer VALUES
INERT '12.34.56.78', 'Thinkpad X123');
(
INTO part VALUES
INERT '987.654.321', 'Motherboard XYZ', '12.34.56.78'),
('111.222.333', 'RAM 8GB', '12.34.56.78'); (
Notes
In making the triplestore, one needs to invent/find a vocabulary that fits with the domain. Furthermore, the semantics of this vocabulary needs to be documented and/or implemented by some program for it to be useful.
For the relational schema, we can simply use constraints to express much of the information, and there is a fairly straight forward way to encode all of the information.
b)
Triples:
Add triples:
ex-o:Part ex-o:hasPartsOfType ex-o:Part .
ex:motherboard ex-o:hasPart ex:cpu, ex:gpu .
ex:cpu a ex-o:Part ;
ex-o:harRegNr "192.837.465" ;
ex-o:hasLabel "CPU 5.2 GHz" .
ex:gpu a ex-o:Part ;
ex-o:harRegNr "999.888.777" ;
ex-o:hasLabel "GPU 4 GHz" .
ex-o:Cluster ex-o:hasPartsOfType ex-o:Computer ;
ex-o:individualsUniquelyIdentifiedBy ex-o:hasClusterID ;
ex-o:hasOneOf ex-o:label .
ex:cluster a ex-o:Cluster ;
ex-o:hasClusterID "1A" ;
ex-o:hasLabel "main" ;
ex-o:hasPart ex:computer .
Relational:
ALTER TABLE part
ADD COLUMN partOfPart text REFERENCES part(registration_number);
INSERT INTO part VALUES
'192.837.465', 'CPU 5.2 GHz', NULL, '987.654.321'),
('999.888.777', 'GPU 4 GHz', NULL, '987.654.321');
(
CREATE TABLE cluster(
PRIMARY KEY,
clusterID text label text NOT NULL
);
ALTER TABLE computer
ADD COLUMN partOfCluster text REFERENCES cluster(clusterID);
INSERT INTO cluster VALUES
'1A', 'main');
(
UPDATE computer
SET partOfCluster = '1A'
WHERE mac_address = '12.34.56.78';
Notes
It seems easier in this case to extend the relational schema, as
there really is only two simple commands. Assuming a part that is part
of another part is not also part of a computer (directly), we will get a
lot of NULL
-values in the part
-table. Also,
queries over this data may need to handle these NULL
-values
in some way, thus complicating the data access.
c)
Triples:
Add triples:
ex-o:Component ex-o:hasPartsOfType ex-o:Component ;
ex-o:individualsUniquelyIdentifiedBy ex-o:hasID ;
ex-o:hasOneOf ex-o:label .
ex-o:Cluster ex-o:subClassOf ex-o:Component .
ex-o:Computer ex-o:subClassOf ex-o:Component .
ex-o:Part ex-o:subClassOf ex-o:Component .
If one rather made a separate property for each type of
has-part-relationship in a) and b),
i.e. ex-o:computerHasPart
and
ex-o:partHasPart
, then one either delete these triples and
add ex-o:hasPart
-triples, or one can simply add the
following triples:
ex-o:computerHasPart ex-o:supPropertyOf ex-o:hasPart .
ex-o:partHasPart ex-o:supPropertyOf ex-o:hasPart .
Relational:
Either, using making the new terms views:
CREATE VIEW component(id, label, ctype) AS
SELECT clusterID, label, 'cluster' FROM cluster
UNION ALL
SELECT mac_address, label, 'computer' FROM computer
UNION ALL
SELECT regNr, label, 'part' FROM part;
CREATE VIEW partOf(part, whole) AS
SELECT mac_address, partOfCluster
FROM computer
UNION ALL
SELECT regNr, partOfComputer
FROM computer
WHERE partOfComputer IS NOT NULL
UNION ALL
SELECT regNr, partOfPart
FROM computer
WHERE partOfPart IS NOT NULL;
or, by making the old terms views:
CREATE TABLE component(
id text PRIMARY KEY,
label text NOT NULL,
NOT NULL
ctype text
);
INSERT INTO component
SELECT clusterID, label, 'cluster' FROM cluster
UNION ALL
SELECT mac_address, label, 'computer' FROM computer
UNION ALL
SELECT regNr, label, 'part' FROM part;
CREATE TABLE partOf(
REFERENCES component(id),
part text REFERENCES component(id)
whole text
);
INSERT INTO partOf
SELECT regNr, partOfComputer
FROM computer
WHERE partOfComputer IS NOT NULL
UNION ALL
SELECT regNr, partOfPart
FROM computer
WHERE partOfPart IS NOT NULL;
DROP TABLE part;
DROP TABLE computer;
DROP TABLE cluster:
CREATE VIEW part(regNr, label, partOfComputer, partOfPart) AS
SELECT c.id, c.label, p.whole, NULL
FROM component AS c JOIN partof AS p ON (c.id = p.part)
WHERE c.ctype = 'part' AND p.whole IN (SELECT id FROM component WHERE ctype = 'computer')
UNION ALL
SELECT c.id, c.label, NULL, p.whole
FROM component AS c JOIN partof AS p ON (c.id = p.part)
WHERE c.ctype = 'part' AND p.whole IN (SELECT id FROM component WHERE ctype = 'part')
CREATE VIEW computer(mac_address, label, partOfCluster) AS
SELECT c.id, c.label, p.whole
FROM component AS c JOIN partOf AS p ON (c.id = p.part)
WHERE c.ctype = 'computer';
CREATE VIEW cluster(clusterId, label) AS
SELECT id, label
FROM component
WHERE ctype = 'cluster';
Notes
The triples are, compared to the relational schema, easier to extend. If other types of components (e.g. servers, data centers, etc.) were to be added to the hierarchy, the triplestore only needs a few added triples.
Both of the relational approaches require more substantial updates (the first would need to drop and recreate updated views if new types of components are to be added, and the second is easier to maintain in light of new types of components, but is a rather large rewrite of the original schema).
However, the relational schema is easier to utilize and verify, as the meta data is expressed as constraints and are therefore automatically checked by any RDBMS. For the triples, the semantics of the vocabulary either needs to be implemented directly, or defined in terms of an existing vocabulary with implemented reasoners. Reasoners can typically check for consistency, but often cannot do more complex forms of constraint checking.
The point of this exercise was really to get a feel for different data structures, and the properties they have. We will learn more about semantics, reasoning, etc. later in the course!
2. Implicit data
ex:carl ex:hasGivenName "Carl" ; # The individual's given/first name is "Carl"
ex:hasSureName "Smith" ; # The individual's sure/last name is "Carl"
ex:livesAtStreet ex:streetroad ; # The individual lives at a street named "Streetroad"
ex:livesInCity ex:oslo ; # The individual lives in a city named "Oslo"
rdf:type ex:Person ; # The individual denotes a person
ex:hasSameSureNameAs ex:mary ; # The person has the same sure name as ex:mary
ex:hasDifferentFirstNameAs ex:mary ; # The person has a different given name as ex:mary
ex:livesInSameCityWith ex:mary ; # The person lives in the same city as ex:mary
ex:isDifferentFrom ex:mary ; # The person is not the same person as ex:mary (assuming a person can only live in one house or have only one name)
rdf:type ex:NotHomeless, ex:NamedPerson . # The person is not homeless and has a name
# Can express many of the same or similar statements for ex:mary as above.
# However, can also start describing e.g. the addresses:
[] rdf:type ex:Address ;
ex:address "Streetroad 1, 1234, Oslo" ;
ex:inZipCode "1234" ;
ex:onStreet ex:Streetroad ;
ex:inCity ex:oslo .
# And then describe streets:
ex:Streetroad rdf:type ex:Street ;
ex:name "Streetroad" ;
ex:inCity ex:oslo .
# And then cities:
ex:oslo rdf:type ex:City ;
ex:name "Oslo" ;
ex:hasCitizens ex:carl, ex:mary .
# And so on...
3. Meaning from structure in the relational model
ping
is an entity type identified by abip
, with an attributepip
that is an integer. Aping
can bekip
-related to (at most one)pong
.pong
is an entity type identified by abop
with an integer attributepop
.ping
s andpong
s can bebang
-related, where the relationship has a textualkap
-attribute.
4. Meaning from structure in RDF
- We know that
ex:pong
is a property. Furthermore,ex:pong
is not irreflexive (sinceex:pong
relatesex:peng
to itself), thus cannot e.g. be a “different from”, “strictly less than”, “mother of” or similar properties.ex:pong
can also relate other properties.ex:pang
is also a property. - If
ex:pong
relates only properties, this means thatex:peng
is also a property, and thusex:pang
can relate things to properties (asex:ping
isex:pong
-related toex:peng
). - If
ex:pong
denotes equality, then so doesex:pang
, asex:pong
isex:pong
-related toex:pang
. This then implies thatex:ping
equalsex:peng
. - If
ex:pong
denotes superproperty, thenex:peng
is a subproperty of the superpropery-predicate, thus at least also denotes superproperty. This then implies thatex:ping
is a superproperty ofex:peng
.