Data Transformation
A data transformation is any function or procedure that takes data as input and produces data as output. Thus, it is the most central element in data engineering, and most of the other techniques discussed in this course can be viewed as a type of data transformation.
Definitions
This section contains a list of the relevant definitions we use in this course to describe and talk about different types of data transformations.
Data transformation: Any function/procedure that takes data as input and produces data as output
- Input called source
- Output called target
Format: The concrete layout of data in memory/disk with procedures for manipulation
Data structure: Mathematical description of the layout and manipulation of data + data representation approach + minimal semantics
Conceptual data schema: A description of entites/individuals, attributes, relationships, annotations, etc. data describes
Logical data schema: A description of data in terms of a particular data structure
Physical data schema: A description of data in terms of a particular format
Format transformation: Change data from one format to another
Structural transformation: Change data from one data structure to another
Conceptual transformation: Change the conceptual schema of the data
Logical/physical transformation: Change the logical/physical schema of the data within the same structure/format
Direct data transformation: A data transformation that directly changes the data
Indirect data transformation: A data transformation that implicitly changes the data by changing the schema
Data transformation via indirect data transformations is often called data mediation
Destructuring: A strucutal transformation going from a structurally rich to poor structure (relational to triples)
Structuring: A strucutal transformation going from a structurally poor to rich structure (triples to relational)
Lifting: A strucutal transformation going from a semantically poor to rich structure (relational to OWL)
Lowering: A strucutal transformation going from a semantically rich to poor structure (OWL to relational)
Stored transformation: Target data is computed and stored explicitly in a data format
Virtual transformation: Queries to the target are rewritten using the definition of the transformation to equivalent queries over the source data