In this lecture, we will look more in depth on data transformations, such as mappings, querying and reasoning, both for mapping between formats, structures, and changes affecting the physical, logical and conceptual schema/model of the data, using the technologies we have seen so far in the course.
To mix things up a little, we are this week going to work with JSON data. So in this exercise we are going to work with weather data from the Norwegian Meterological Institute (Yr), which is published as JSON. JSON is a key-value format heavily used for data interchange. The exercises will use this Makefile.
To start, download the Makefile into an empty folder, update the database connection variables (username, database name, host and password), and execute
1. Different Pipelines
The Makeflie will make a similar folder structure as the mandatory exercise and download the weather forecast for a couple of points in Norway. This is done in the Makefile with the
curl-command and Yr’s Weather API as follows:
curl "https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=60.10&lon=9.58" > data/met60_10-9_58.json
This downloads the JSON-file with the weather forecast for the next few hours at location
(60.10, 9.58) into the file with name
data/met60_10-9_58.json. To see what the file looks like you can of course simply open it in a text editor. However, the JSON-file is better formatted by your browser, so you can open it in e.g. Firefox by running
The Makefile will also download the forecast for another point,
(59.2, 10.10), done with:
curl "https://api.met.no/weatherapi/locationforecast/2.0/compact?lat=59.20&lon=10.10" > data/met59_20-10_10.json
We will now look at different pipelines for transforming this data into RDF, under the following vocabulary:
ex-o:Forecast- class of all weather forecasts (for a given location and timepoint)
ex-o:time- relates a weather forecast to the time of the forecast
ex-o:lat- relates a weather forecast to the latitude of the location of the forecast
ex-o:lon- relates a weather forecast to the longitude of the location of the forecast
ex-o:windSpeed- relates a weather forecast to the wind speed of the forecast
ex-o:humidity- relates a weather forecast to the relative humidity of the forecast
ex-o:temperature- relates a weather forecast to the temperature of the forecast
ex-o:precipitation- relates a weather forecast to the amount of precipitation of the forecast
ex-o is prefix for
Below are two pipelines translating the JSON-files into RDF described with the above vocabulary. These are called
pipeline2 in the Makefile.
The first pipeline starts by copying the JSON-data directly into an SQL-table, and uses PostgreSQL’s built-in support for handling JSON via SQL to make a view containing the desired data. Details on the functions used can be seen in the PostgreSQL Manual, but a quick outline of the used functionality is as follows:
object->"key"returns the JSON-object
object’s value for
"key"as a JSON-object
object->>"key"returns the JSON-object
object’s value for
"key"as text (which can then be cast into a proper PostgreSQL type, such as
json_array_elements(json_array)returns a table of JSON-values with one value per element in
The pipeline then uses direct mappings to translate the view into RDF and applies an ontology to map it into the target vocabulary.
The second pipeline uses a program called
json2rdf which directly maps the JSON-file into RDF. It then uses rules to map the data into the desired vocuabulary. Note that
json2rdf does not preserve ordering in arrays, so we need to add a small hack to map the coordinates array into separate
lon values within the JSON-file with the somewhat cryptic
sed-command in the Makefile.
For each of the following pipelines, state what type of data transformation each step is, i.e. which of the following (can be one or more for each step):
- format transformation
- structural transformation
- physical transformation
- logical transformation
- conceptual transformation
Note that it is not always clear cut which of these each step is, as it depends on where one sets the boundary for the different schemas, formats and structures.
You are advised to execute the commands listed yourself, to see what each step does and how it transforms the data. In each step only the
make-command is shown, and you need to look it up in the
Makefile to see what is does. For the pipelines to work correctly, you need to execture
between each pipeline, so that the (similar) data from the two pipelines are not merged.
1. PostgreSQL + VIEWS + direct mappings + ontology
2. json2rdf + rules
2. Extending the Pipelines
In this exercise we will extend the above pipelines to add more data, and in the process see how easy they are to extend.
The data we will extend our pipelines with is also weather forecasts, but from the German equivalent of the Norwegian Meterological Institute, namely Deutcher Wetterdienst. We will use a service called Bright Sky which provides the DWD data as JSON.
Thus, start by adding the folloing rule to the Makefile:
data/dwd52_0-7_60.json: curl "https://api.brightsky.dev/weather?lat=52&lon=7.6&date=2021-03-17" > data/dwd52_0-7_60.json
and add this as a dependency to the
This will download a JSON-dataset containing the weather forecast for the location
(52.0, 7.60). Note that the date is part of the URL in the Makefile, so this might need to be updated depending on when you do this exercise.
The exercise is as follows: Extend each of the above pipelines to also include this new dataset from DWD, so that the data is intergrated under the same vocaublary as listed in the previous exercise.
Note how much effort one needs to put into the extension and how much can be reused from the previous pipeline.
The solution can be found here.