This lecture gives an introduction to declarative data engineering and an overview of the course, with practical information.
Relevant Wiki pages
Start by reading up on the fundamental technologies, by following the links provided on each of the wiki-pages listed above. You do not need advanced knowledge of any of these technologies, but should know the basics so that you are able to learn more advanced features if necessary.
The exercises below will test your basic knowledge of the fundamental technologies listed above.
Each exercise is prefixed with the technology (from the above list) you should use.
Before you start, you should make sure you have access to a terminal with Bash. Linux and Mac machines have this installed by default. If your are using Windows you can either use Windows Subsystem for Linux or use Putty to log into a Linux machine at IFI remotely. Also make sure you have a text editor available on the machine you are working on (e.g. Vim, Nano, Emacs, Sublime, Atom).
Then, make a new Git repository to do the exercises in. Follow the steps below to do this:
Go to UiO’s Github and log in with your UiO username and password
Click on the green “New”-button on the left side of the page
Fill in a name for your repo, e.g.
IN5800-intro, and click on the green “Create repository”-button
Open up a terminal (running Bash)
Execute the following command to clone the Git repository to your computer:
git clone <url>
<url>is the URL of your newly created repository (you can simply copy-paste the URL from your browser right after you have created the repository). E.g. I would run:
git clone https://github.uio.no/leifhka/IN5800-intro
Type in your UiO username and password when prompted for this.
Congratulations! :D You have now made a Git repository and cloned it to your computer.
Exercise 1: Make a README-file
We will start by making a simple
README-file in the repo
and push it to the remote repo.
- [Bash] Change your working directory to the newly cloned repo.
- [Markdown] Use your favorite text editor and create a new Markdown
file with the name
README.mdcontaining a header
Readmeand the text
This repo is used for the intro exercises in IN5800.. (You can see a stylized view of your
README-file on the main page of your repo (same URL as you used to clone it))
- [Git] Add, commit and push the changes to the repo with the commit
Exercise 2: Download and manage files
We are now interested in the info about our course (IN5800) contained in a data file.
- [Bash] Make new directories called
- [Bash] Download the Zip-file at https://leifhka.org/in5800/lectures/intro/data.zip
into the newly created
- [Bash] Unzip the folder and move the unzipped file
data.csv) into the
- [Bash] Use
cat, pipe (
grepto print out the line starting with
in5800(hint: the regular expression
^in5800.*will match lines starting with
- [Bash] Remove the folders
dataand all files contained in them
Exercise 3: Make Makefile
As the data in the CSV-file from the previous exercise might change, we want to automate the steps done above, so we will make a Makefile for this.
- [Make] Open up a new file named
Makefilein your favorite text edtior and create one Make-rule per sub-exercise in the previous exercise
- Be sure to include proper dependencies in your rules (e.g. the rule for the second subexercise should depend on the rule for the first)
- Let the rule for the 4. subexercise be named
in5800_dataand the final rule be named
- [Make] Execute the
in5800_datarule (note that when you execute a Makefile, it also outputs all the Bash-commands it executes)
- [Make] Execute the
- [Markdown] Add a new subheader
README.md-file containing the text
Below is a list of useful commands:(where
commandsis bold) followed by a list containing the two items
- [Git] Add, commit and push the changes done to the repo with the
Add a Makefile to automate the information extraction.
Exercise 4: Branching and Make-variables
It is often nice to keep URLs out of the rules in Makefiles, and
rather put them in separate variables. Thus, you will now fix your
Makefile, but to be on the safe side, lets do the changes in a separate
Git-brach and test it before merging it into your
- [Git] Create a new branch (and switch to it) with the name
- [Make] Make a new variable in your
data_url) that contains the URL of the ZIP-file to download and replace the URL in the make-rule with the use of the newly made variable instead
- [Make] Check that everything works by executing the
- [Git] Add, commit and push your changes to the
- [Git] Switch back to your
master-branch and merge it with the
A solution to the exercises is provided here. It is wise to make an honest attempt at the exercises before consulting the solution ;)