Lecture 1: Introduction and Course Overview
This lecture gives an introduction to declarative data engineering and an overview of the course, with practical information.
Relevant Wiki pages
Exercises
Start by reading up on the fundamental technologies, by following the links provided on each of the wiki-pages listed above. You do not need advanced knowledge of any of these technologies, but should know the basics so that you are able to learn more advanced features if necessary.
The exercises below will test your basic knowledge of the fundamental technologies listed above.
Each exercise is prefixed with the technology (from the above list) you should use.
Setup
Before you start, you should make sure you have access to a terminal with Bash. Linux and Mac machines have this installed by default. If your are using Windows you can either use Windows Subsystem for Linux or use Putty to log into a Linux machine at IFI remotely. Also make sure you have a text editor available on the machine you are working on (e.g. Vim, Nano, Emacs, Sublime, Atom).
Then, make a new Git repository to do the exercises in. Follow the steps below to do this:
Go to UiO’s Github and log in with your UiO username and password
Click on the green “New”-button on the left side of the page
Fill in a name for your repo, e.g.
IN5800-intro
, and click on the green “Create repository”-buttonOpen up a terminal (running Bash)
Execute the following command to clone the Git repository to your computer:
git clone <url>
where
<url>
is the URL of your newly created repository (you can simply copy-paste the URL from your browser right after you have created the repository). E.g. I would run:git clone https://github.uio.no/leifhka/IN5800-intro
Type in your UiO username and password when prompted for this.
Congratulations! :D You have now made a Git repository and cloned it to your computer.
Exercise 1: Make a README-file
We will start by making a simple README
-file in the repo
and push it to the remote repo.
- [Bash] Change your working directory to the newly cloned repo.
- [Markdown] Use your favorite text editor and create a new Markdown
file with the name
README.md
containing a headerReadme
and the textThis repo is used for the intro exercises in IN5800.
. (You can see a stylized view of yourREADME
-file on the main page of your repo (same URL as you used to clone it)) - [Git] Add, commit and push the changes to the repo with the commit
message
Add README-file.
Exercise 2: Download and manage files
We are now interested in the info about our course (IN5800) contained in a data file.
- [Bash] Make new directories called
downloads
anddata
- [Bash] Download the Zip-file at https://leifhka.org/in5800/lectures/intro/data.zip
into the newly created
downloads
-folder usingwget
- [Bash] Unzip the folder and move the unzipped file
(
data.csv
) into thedata
-folder - [Bash] Use
cat
, pipe (|
) andgrep
to print out the line starting within5800
(hint: the regular expression^in5800.*
will match lines starting within5800
) fromdata.csv
- [Bash] Remove the folders
downloads
anddata
and all files contained in them
Exercise 3: Make Makefile
As the data in the CSV-file from the previous exercise might change, we want to automate the steps done above, so we will make a Makefile for this.
- [Make] Open up a new file named
Makefile
in your favorite text edtior and create one Make-rule per sub-exercise in the previous exercise
- Be sure to include proper dependencies in your rules (e.g. the rule for the second subexercise should depend on the rule for the first)
- Let the rule for the 4. subexercise be named
in5800_data
and the final rule be namedclean
- [Make] Execute the
in5800_data
rule (note that when you execute a Makefile, it also outputs all the Bash-commands it executes) - [Make] Execute the
clean
rule - [Markdown] Add a new subheader
Use
to yourREADME.md
-file containing the textBelow is a list of useful commands:
(wherecommands
is bold) followed by a list containing the two itemsin5800_data
andclean
- [Git] Add, commit and push the changes done to the repo with the
commit message
Add a Makefile to automate the information extraction.
Exercise 4: Branching and Make-variables
It is often nice to keep URLs out of the rules in Makefiles, and
rather put them in separate variables. Thus, you will now fix your
Makefile, but to be on the safe side, lets do the changes in a separate
Git-brach and test it before merging it into your
master
-branch.
- [Git] Create a new branch (and switch to it) with the name
feature/URLs-in-vars
- [Make] Make a new variable in your
Makefile
(e.g.data_url
) that contains the URL of the ZIP-file to download and replace the URL in the make-rule with the use of the newly made variable instead - [Make] Check that everything works by executing the
in5800_data
andclean
rules - [Git] Add, commit and push your changes to the
feature/URLs-in-vars
-branch - [Git] Switch back to your
master
-branch and merge it with thefeature/URLs-in-vars
-branch
Solution
A solution to the exercises is provided here. It is wise to make an honest attempt at the exercises before consulting the solution ;)