Leif Harald Karlsen
Welcome to a relatively new course!
From 2000 onward we have had a sharp increase in:
Managing data size and complexity difficult, in many areas such as:
Big Data is characterized by:
Introduces the need for data-centric roles
From Quanthub:
“Data” engineers design and build pipelines that transform and transport data into a format wherein, by the time it reaches the Data Scientists or other end users, it is in a highly usable state. These pipelines must take data from many disparate sources and collect them into a single warehouse that represents the data uniformly as a single source of truth.
By me:
Data engineering is the art of creating data pipelines producing a coherent data source from disparate (possibly unstructured and messy) data sets, that is usable by data scientists, computer programs, and other data consumers.
Imperatively ask for a glass of water:
Could you please go 2 meters to the left, stretch your arm out, pull the door handle down and towards you. Then go through the door, turn left, go 4 meters forwards, turn right, …, and let go of the glass.
Declaratively ask for a glass of water:
Water is liquid H2O and a glass is melted sand shaped so that its content doesn’t pour out. Could you get me a glass of water, please?
A data engineer typically use:
In this course, we will focus on:
Date | Lecture | Work |
---|---|---|
26.01 | Intro | |
02.02 | Data structure | |
09.02 | Query languages | |
16.02 | Views and rules | |
23.02 | Semantics and reasoning | |
02.03 | Templates | |
09.03 | Mapping languages | [O] |
16.03 | Constraints | [O] |
23.03 | Transforming/Structuring | [O] |
30.03 | Oblig solution | |
13.04 | Saturation | |
20.04 | Integration | |
27.04 | Ontology engineering | [P] |
04.05 | Cleaning and Validation | [P] |
11.05 | Pipelines | [P/Pr] |
25.05 | Project presentations | [P/Pr] |
01.06 | (no lecture) | [P] |
The course wiki will
I expect you to