IN5800 – Constraints

Leif Harald Karlsen

What are Constraints?

Constraints vs. Semantics

Constraints: What must be true about the data

Semantics: What is true in the domain (of the data)

Forms of Constraints

Constraints in Relational Databases

  • Shape of data:
    • Forced structure: Tables and columns
  • Shape of values:
    • Types
    • NOT NULL
    • CHECK
  • Triggers

Constraints and RDF

  • RDF does not include constraints as part of the language (like relational databases)
  • Can be difficult to know whether your data actually looks the way you think
  • RDF’s metadata, (i.e. semantics), is not a sufficient language
  • Will now look at ways of constraining RDF graphs

Constraints in Mappings: Direct Mappings

Constraints in Mappings: General Mappings

Constraints in Mappings: OTTR

  • However, OTTR templates do to some degree force the shape of the resulting RDF
  • A template instance must have the correct number of arguments
  • Each instance of the same template will result in the same graph shape
  • The type system ensures that values have the correct type
  • But, no checks accross templates
    • E.g. all ex:Employee-instances must be ex:worksFor-related to a ex:Company-instance

Example: OTTR as Constraints

  ottr:IRI ?person,
  xsd:string ?name,
  xsd:string ?phone,
  xsd:int ?age,
  ! ottr:IRI ?worksFor
] :: {
  o-rdf:Type(?person, ex:Employee),
  ottr:Triple(?person, ex:hasName, ?name),
  ottr:Triple(?person, ex:hasPhoneNumber, ?phone),
  ottr:Triple(?person, ex:hasAge, ?age),
  ottr:Triple(?person, ex:worksFor, ?woksFor)
} .

  ottr:IRI ?company,
  xsd:string ?name
] :: {
  o-rdf:Type(?company, ex:Company),
  ottr:Triple(?person, ex:hasName, ?name)
} .
# OK

ex:Employee(ex:per, "Per", "98765432", 32, ex:uio) .
ex:Employee(ex:kari, "Kari", "123456", 34, ex:dnb) .

ex:Company(ex:uio, "Universitetet i Oslo") .
ex:Company(ex:ntnu, "NTNU") .

ex:Employee(ex:ole, "Ole", 12345678, "34", ex:uio) .
ex:Employee(ex:nils, "Nils", "23456789", 34) .     
ex:Employee(ex:mari, "Mari", "34567890", 34, _:b) .     

ex:Company(ex:ruter, "Ruter", <>) .

Constraints in RDF: SHACL – Example


ex:Employee a owl:Class .
ex:Company a owl:Class .

ex:hasName a owl:DatatypeProperty;
    rdfs:range xsd:string .

ex:age a owl:DatatypeProperty;
    rdfs:domain ex:Employee; 
    rdfs:range xsd:integer .

ex:hasPhoneNumber a owl:DatatypeProperty;
    rdfs:domain ex:Employee; 
    rdfs:range xsd:string .

ex:worksFor a owl:ObjectProperty;
    rdfs:domain ex:Employee; 
    rdfs:range ex:Company .

ex:contactPerson a owl:ObjectProperty;
    rdfs:domain ex:Company; 
    rdfs:range ex:Person .

SHACL Shapes:

    a sh:NodeShape ;
    sh:targetClass ex:Employee ;        # Applies to all individuals of ex:Employee
    sh:property [                 
        sh:path ex:hasPhoneNumber ;           
        sh:datatype xsd:string ;
        sh:pattern "^\\d{8}$" ;         # Phone numbers are strings of 8 digits
    ] ;
    sh:property [                 
        sh:path ex:hasName ;           
        sh:minCount 1 ;                 # All employees must have at least one name
        sh:maxCount 1 ;                 # All employees must have at most one name
        sh:datatype xsd:string ;
        sh:pattern "^[A-Z][a-z]+$" ;    # Names starts with capitals, then lower-case letters
    ] ;
    sh:property [
        sh:path ex:age ;                
        sh:minInclusive 16 ;            # Age is an xsd:int >= 16
        sh:maxInclusive 150 ;           # Age is <= 150
    ] ;
    sh:property [                 
        sh:path ex:worksFor ;
        sh:minCount 1 ;                 # All employees must work for a company
        sh:node ex:CompanyShape ;       # The object must conform to the ex:CompanyShape
    ] .

    a sh:NodeShape ;
    sh:targetClass ex:Company ;         # Applies to all individuals of ex:Employee
    sh:property [                 
        sh:path ex:hasName ;
        sh:minCount 1 ;                 # All companies must have at least one name
        sh:datatype xsd:string ;        # But can be any string value
    ] .

Valid data:

ex:per rdf:type ex:Employee ;
    ex:hasPhoneNumber "12345678" ;
    ex:hasName "Per" ;
    ex:hasAge 32 ;
    ex:worksFor ex:UiO .

ex:kari rdf:type ex:Person ;       # Not checked (not ex:Employee)
    ex:hasPhoneNumber "98765432" ;
    ex:hasName "Kari" ;
    ex:worksFor ex:UiO .
ex:UiO rdf:type ex:Company ;
    ex:hasName "Universitetet i Oslo" ;
    ex:hasName "University of Oslo" .

Invalid data:

ex:peter rdf:type ex:Employee ;
    ex:hasPhoneNumber "12345678" ;
    ex:hasAge 32 ;
    ex:hasName "Per" .             # Missing ex:worksFor-relationship

ex:kari rdf:type ex:Employee ;
    ex:hasPhoneNumber 12345678 ;   # Wrong type
    ex:hasName "Kari" ;
    ex:hasName "Karry" ;           # Two names
    ex:worksFor ex:UiO .           # ex:uib does not conform to the ex:CompanyShape
ex:UiO rdf:type ex:Company .       # No name

ex:NTNU rdf:type ex:Company ;
    ex:name "NTNU" .               # Wrong relationship

Constraints in RDF: SHACL

  • The best way to ensure an RDF-graph is correct is by adding constraints directly to it
  • SHACL (Shapes Constraint Language) is a constraint language for RDF
  • Constraints are written in RDF and are called shapes
  • Specify the shape of the data by specifying the properties of paths through the graph
  • Specify target nodes for the constraints based on classes, properties
  • Can e.g. specify constraints on all members of a particular class
  • Shapes can reuse other shapes
  • Note: SHACL also has support for defining inference rules based on shapes
    a sh:NodeShape ;
    sh:targetClass ex:Employee ;
    sh:property [                 
        sh:path ex:hasPhoneNumber ;           
        sh:datatype xsd:string ;
        sh:pattern "^\\d{8}$" ;
    ] ;
    sh:property [                 
        sh:path ex:hasName ;           
        sh:minCount 1 ;
        sh:maxCount 1 ;
        sh:datatype xsd:string ;
        sh:pattern "^[A-Z][a-z]+$" ;
    ] ;
    sh:property [
        sh:path ex:age ;                
        sh:minInclusive 16 ;
        sh:maxInclusive 150 ;
    ] ;
    sh:property [                 
        sh:path ex:worksFor ;
        sh:minCount 1 ;
        sh:node ex:CompanyShape ;
    ] .

SHACL vs. relational constraints: Limitations of relational constraints

CREATE TABLE employee(
    eid int PRIMARY KEY,
    ename text NOT NULL,
    age int,
    phone text,
    worksFor int NOT NULL REFERENCES company(cid)
    cid int PRIMARY KEY,
    cname text NOT NULL,
    contactPerson int REFERENCES person(pid)

SHACL vs. relational constraints: Limitations of relational constraints

    # [...]
    sh:property [
      sh:path ex:hasContactPerson ; 
      sh:minCount 1 ;               
      sh:and (                          # sh:and is logical conjunction of shapes
        ex:EmployeeShape                # Must be employee (Note: Recusive/cyclic)
        [                               # Must have a phone number
          sh:property [
              sh:path ex:hasPhoneNumber ; 
              sh:minCount 1 ;               
    ] .

SHACL vs. relational constraints: Triggers for relational constraints

      FROM employee
      WHERE eid = NEW.contactPerson
    RAISE EXCEPTION 'Company ' || NEW.cname || ' has contact person without phone.';
$$ language plpgsql;

CREATE TRIGGER contactperson_trig
FOR EACH ROW EXECUTE PROCEDURE contactperson_trig_fn();

Constraints via Queries

-- Checks that all contactPersons have a phone number
-- Every answer is a violation of the constraint: "Every contact person must have phone number."

SELECT 'Contact person for company ' || c.cname
       || ' does not have a contact number!' AS violation
FROM employee AS e JOIN company AS c ON (e.worksFor = c.cid)

Constraints in RDF: SPARQL/SPIN

SELECT (CONCAT("ERROR: Mising name for ", STR(?p)) AS ?error)
    ?p rdf:type ex:Employee .
    FILTER NOT EXISTS { ?p ex:hasName ?n . }

Constraints vs. Semantics Revisited

Case 1:

# Mixing Turtle and OWL Manchester syntax here

:Company subClassOf
    :hasContactPerson some (:Employee and :hasPhoneNumber some :PhoneNr ).

:abc a :Company ;
    :hasContactPerson :mary .
:mary a :Employee ; :hasPhoneNr [ rdf:type :PhoneNr ] .        #Inferred

Case 2:

:Company subClassOf
    :hasContactPerson some (:Employee and :hasPhoneNumber some :PhoneNr ).

:id rdf:type owl:InverseFunctionalProperty .

:abc :hasContactPerson :mary .
:mary a :Person ;
    :id "123" .
_:p :id "123" ;
    :hasPhoneNumber "98765432" .
:mary :hasPhoneNr "987654332" .                    #Inferred

Note: Can speficy that values should be non-blank in SHACL

Semantics as Constraints

  • Semantics gives meaning to data
  • Thus, certain combinations of statements can therefore be considered contradictory
  • Contradictions are impossible in the real world
  • So the data (or the semantics) must be incorrect
  • Thus, a form of constraint on correctness

Semantics as Constraints

CREATE RELATION inconsitency(description text);

inconsistency('Company ' || pname || ' has contact person without phone number present!')
    <- person(pid, pname, phone), company(cid, cname, pid) : phone IS NULL;

inconsistency('IRI ' || p || ' is both a ex:Person and a ex:Cat!')
    <- rdf.type(p, qn('ex', 'Person')), rdf.type(p, qn('ex', 'Cat'));

CREATE FUNCTION inconsistency_fn() RETURNS trigger AS
  RAISE EXCEPTION 'Inconsitency detected: ' || NEW.description;
$body$ language plpgsql;

CREATE TRIGGER inconsitency_trigger

Semantics as Constraints

# (Mixing Turtle and OWL Manchester syntax here)

# Try to state "all companies MUST have a contact person that has a phone number"

:Company subClassOf
    :hasContactPerson some (:Employee and :hasPhoneNumber some :PhoneNr ).

:abc a :Company ;
    :hasContactPerson :mary .
:mary a :Employee ; :hasPhoneNr [ rdf:type :PhoneNr ] .          #Inferred

Constraints and Open/Closed World

What do Constraints Really Check?

  • Obviously, correctness of (shape of) data
  • However, data is produced by a (possibly complex) pipeline
  • Thus, also checks correctness of
    • mappings (transformations and cleaning)
    • semantics
    • integration
    • assumptions about data
  • Fails at insert/processing step instead of at runtime
  • Points directly to what went wrong

When to Define Constraints


Temporal data and types

  • Data about time, i.e. when something happened
  • Can be points or intervals on the time axis
  • Always have the special point now
    • Partitioning the axis into past, present and future
  • Examples:
    • Dates
    • Timestamps, with or without timezone (e.g. 2021-01-21 10:15:00+01)
    • Unix time and other relative measures
    • Epochs, eras, geological time, astronomical time, etc.
  • Valid time vs. transactional time

Complexity of Temporal data

  • Difficult to measure time
    • No absolute scale
    • Typically measured with respect to something (e.g. position of sun, moon)
    • Depends on location, means need to translate (e.g. between time zones)
    • General relativity (time depends on gravity, speed, etc.)
    • Different scales: Astronomical, geological, historical, daily, nano-scale
  • Contains discontinuities, ambiguities, etc.
    • Daylight savings time
    • Leap years and seconds
    • Reforms of calendars (e.g. 10-days missing in Gregorian Calendar 05.10.1582 - 15.10.1582, New Year’s moved, etc.)
    • Start of week (Sunday in US, Monday in Europe) and week numbers
  • The type system removes many of these pains
    • Timezone-aware types
    • Operations take edge cases into account

Spatial data

  • Data about location and extent
  • Typically points, lines, polygons, polyhedra, etc.
    • E.g. POINT(1.0 2.0), LINESTRING(1.0 2.0, 2.1 3.2, 4.1 7.3)
  • Also multi-point, multi-lines, etc.
  • Special constant here
  • Examples:
    • Geographic and map data
    • Models of objects (cars, organs, etc.)
    • Geological, astronomical, archaeological, etc.

Complexity of Spatial data

  • Lots of functions, operations and relations
  • Complicated algorithms
  • Use of floats complicate computations
    • E.g. might be impossible construct intersection of two intersection objects
  • We live on a globe, complicates maps (e.g. multiple projections)
  • Contains lots of implicit data

Spatial data in Query Languages

  • Spatial data is complex, need extensions
  • PostGIS is a state-of-the-art geospatial extension for PostgreSQL
  • GeoSPARQL is a similar extension for SPARQL
  • Often not or only partially supported by triplestores/SPARQL implementations
  • Otherwise, need to translate quantitative geospatial data into qualitative
    • A usefull exercise
  • Also plays better with semantics