Acknowledgements

Daan van Eijk (dvaneijk@icecube.wisc.edu) and Nahee Park (npark@icecube.wisc.edu) edited these tutorials for the 2018 IceCube Bootcamp. The original versions were written by Kyle Jero and James Bourbeau for previous IceCube Bootcamps. The section 'Introduction to Python' was heavily influenced by two sources:

Prerequisites

[ back to top ]

To follow these tutorials, you need to have a working installation of

In case you have any questions on installation of these packages or otherwise, please email Daan (daan.vaneijk@icecube.wisc.edu) or Nahee (npark@icecube.wisc.edu).

All the Python commands you can find in this tutorial can be run through one of the following ways:

  • In a Jupyter notebook
  • On the iPython console (by typing 'ipython' on the command line)
  • Save the commands as a file with extension '.py' and run this file as a python script by typing 'python [nameofyourpythonscript.py]' on the command line.

A few words on programming

[ back to top ]

More then likely you are sitting in front of a computer and reading this, if I told you to communicate with your computer and make it add some number or print out the current time what would you do? Odds are you would not start typing in the binary code so the computer would understand directly to make those things from scratch. As you will see soon you don't have to do that to get your point across effectively, though.

One picture you can keep in your head when imagining how programming works, at a very superficial level, is three people writing emails back and forth to each other. The first person is you, the second person is the interpreter, and the third person is your computer. You know how to write emails in English and soon python, the interpreter knows how to speak in python and the computer's language, and your computer knows only how to talk in its own language but can speak any language if given very explicit direction on how to do so.

So when you receive a task that you want to program, you translate English into python and write it down in an email to the interpreter. The interpreter receives this email and begins translating it into the computer's language, checking for errors as it goes. If an error is found, the interpreter will email you back and attempt to point out where your mistake was and what kind of error it thinks you made. Once the interpreter completes the entire translation and does not find any errors it will send that interpretation to your computer. When your computer receives the email it begins running the program described. The program may still have errors that the interpreter did not catch and if your computer catches them it will send an email to you specifying the error and it's location so you can email the interpreter with a modified version of the program. If there are no problems then the computer will complete any tasks you requested and write files to the system or email you back with things you requested from the results of the program.

Notice that at no point did the interpreter or your computer attempt to guess what you were trying to say, they simply read the emails and translate or run them. This means that if you write a program that is grammatically correct and can be run, it will run and neither the interpreter or your computer will warn you that things are wrong. One way to help prevent mistakes from happening is to ask your computer to be very verbose in it's replies the first few times it runs a program (print a lot of information) and verify that what its saying matches up with what your original idea was.

— Kyle Jero, wise hermit

The command-line shell

[ back to top ]

What is a command-line shell?

  1. A program that interprets commands and arranges for them to be carried out
  2. Allows for users to interact with files and processes in the operating system
  3. A shell is not an operating system, it is a way to interface with the operating system and run commands

On most Linux systems, bash is the default shell program. However, there are other shell programs that can be used as well (e.g. ksh, zsh, etc.).

Some useful command-line commands and tips

Please open a terminal and try the following commands and tips on the command line:

  • Tab is your friend, if you get stuck trying to remember the exact name of a command, file, or path hit tab a few times
    1. Commands and file names can be completed by hitting tab
    2. Tabbing once will complete the word you are trying to use if you have specified enough characters to make the word unique.
    3. Tabbing twice will present you with a list of words that match what you have typed thus far and are valid commands or file names.
  • Up and down arrows will allow you to see past commands.
  • Let's practice using tab and learn a shell command
    • Type 'to' and hit tab twice
    • You should be presented with a list of command names that match the word you have typed, i.e. start with 'to'.
    • Let's see what is running on our system by finishing the command so it reads 'top' and hitting enter: this program shows you the running processes and some basic information about them. We can now exit 'top' by hitting q which returns us to our shell.
  • Navigating directory structures:
    • The command 'pwd' tells us what our current directory is
    • The contents of our current directory can be found by using the command 'ls' without any arguments.
    • Change the current directory : 'cd [directory]'
    • We don't have to move to a directory to see what it's contents are, that can be accomplished simply by using 'ls [directory]'. And don't forget tab completion is your friend!
    • The concatenation of directories is called a path. A path that does not start with the root directory / is called relative because its specification is relative to where we are in the directory tree. This path can also be specified by giving the full path to the directory we want, i.e. we string together the results of pwd and the directory below us that is of interest.
    • There are few other special names for directories in unix-like systems. The name '.' refers to the directory you are in now and the command 'ls .' therefore gives the same results as 'ls'.
    • The name '..' refers to the directory above you, check this by entering 'ls ..'. Notice that also 'cd ..' works as expected.
    • Programs can take options and arguments when they are called. Options are ways to invoke different runtime behavior for the program whereas arguments are things which the program will be run on such as a file or folder. Note that options are generally specified before arguments in commands and have a '-' or '--' prefix. Let's try an example with 'ls' by entering the command 'ls -lh'. The options of this command are l and h. l tells ls to print file and folder information in the long format and h tells ls to make the size units human readable. The argument to this command is . which tells ls to look in the current directory for files and folders to give us information about.
    • To learn about what options a program takes and what arguments it expects the program 'man' (from manual) is often used. Let's get the manual for ls by using the command man ls. man pulls up the manual of the program we specify in its argument. At the top of the manual we can see that the program takes options and then a file or folder for an argument. ls [OPTION]... [FILE].... In the DESCRIPTION section a small snippet of what each option does is given. You can exit from man by typing in q
  • Some useful shell programs
    • 'echo' - Takes a word or words as an argument and prints them to the screen. Check by entering 'echo Hello World!'
    • 'mkdir' - Takes a word as an argument and makes a directory whose name is that word. Note that the space character is not interpreted as part of the word but as a separation between arguments. To make a directory with a space you must put your argument in quotes ("") which will be interpreted as one argument.
    • 'touch' - takes a word as an argument and allows you to make a file whose name is that word.
    • 'mv' - moves a file from one location to another, either between or within directories. Example: 'mv myfile.txt myfile.text'
    • 'cp' - makes a copy of a file with a new given name. Example 'cp myfile.text myfile2.text'
    • 'rm' - takes a file name as an argument and removes that file from the system. To remove directories there are two options. You can use the rmdir option if the directory is empty or you can use rm with the -r, -ri, or -rf arguments, with r for recursively, i for interactively, and f for forcefully. Be careful with the -rf combination, it will delete whatever you request of it without asking. The other two are much nicer and recommeneded if you are not sure of what is in a directory you are deleting
    • 'less' - takes a file name as an argument and displays it's contents. For files longer then the page you can use the arrows to scroll up and down through the text. Typing h will bring up the help manual which contains commands you can give while less is running. Typing q will exit less and bring you back to the shell.
    • 'grep' - takes a word and a file name and returns the lines from the filename which contain the specified word.
    • 'wc' - takes a file and displays the number of lines, words, and characters in the file.
    • 'chmod' - changes the read, write, and execute permissions of a file/folder.
    • 'sleep' - Waits for a specified number of seconds
    • 'which' - Tells you what executable a command runs.
    • 'history' - Lists the commands that have been run up until this point. The output of this is command is likely longer then can be fit in one screen so let's use it to lead into a short discussion about stream redirection.
  • When the command line prints something, that output is known as a stream. By default that output is printed to our screen, however we can also redirect it to other places.
    • '|' - is called piping. This can be used to send the output of our command to another command as if it were specified as an argument file. Example: 'history | less', which allows us to scroll through the history of our commands.
    • '>' or '>>' - are called redirecting output. They can be used to send the output of a command to a file. The difference between > and >> is that > overwrites the file with the new output from the command while >> appends to the end of the file. By default when the output of the program is captured and sent to the file using > or >> the shell assume a 1 was placed in front of it. This is the specifier for standard output. If instead you use 2> or 2>> the standard error is sent to the file.
    • '<' - is called standard input. This allows you to feed the contents of a file as the argument of a command.
  • When a command is run in the shell the system launches a process which it runs. By default the shell will wait until the process completes to present the option to run another command. There are however ways to run multiple commands from the same shell.
    • '&' - When put at the end of a command the shell will detach the process from the shell and run it in the background. The number given upon the command being run is the process id given by the system and can be reference later to alter the process.
    • 'fg' - Brings the most recent process back from the background to the active process in the shell.
    • 'Ctrl-c' - Holding the control and c keys together will kill the active process in the shell.
    • 'Ctrl-z' - Holding the control and z keys together will halt but not kill the active process in the shell.
    • 'bg' - Often when one uses Ctrl-z the process will need to be set running again. This can be accomplished with the fg command if one wishes to bring it back as the active process in the shell. The process can also be set to run in the background with bg.
    • 'top' - We already know this command :)! Can be used to view the processes on the system, running or otherwise. Also gives information about the resources that the process is using.
  • Wildcard characters allow users to specify that a single character or set of characters in an argument can be any character allowed in the shell.
    • '?' - Specifies a single character can be any legal character.
    • '*' - Specifies any number of characters can be a legal character.

This list of programs, their functions, and redirects/pipes is by no means complete, search the internet or ask others about commands to fulfill needs you have.

Exercise

  1. Let's create a file we will use later in the Python tutorial. We don't want to operate in the home directory, so:
    • Make a directory in wherever you're working called bootcamp_workdir.
    • In that directory make a file called names.txt.
    • In that file, enter the names and ages of 10 people seperated by using the append redirect. Start by adding your new friend James who is 24.
  2. We probably should have named that file something different since it does not encompass what it's contents are. Instead of recreating the file with a different name lets just move it to a file called people.txt.
  3. This file is rather important so we should make sure we don't lose it. Make a copy with the name people.txtbkp.
  4. Try using grep to find one person in the file and redirect the output to a file called favperson.txt.
  5. Find out how many characters are in the file favperson.txt using wc.
  6. List all the files in bootcamp_workdir that end in .txt.
  7. List all the files in bootcamp_workdir that have txt in their name.

Jupyter notebooks

[ back to top ]

According to the official website, "The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more."

When properly installed, the Jupyter notebook app can be launched directly from the command line by typing:

jupyter notebook

The notebook operates in a similar way to the ipython terminal in that it evaluates cells to create programs. The difference is that in the notebook the cells can be more than one line long and you don't have to evaluate the cell to get a new one to write in. Additionally the notebook provides the ability to produce in-line graphics, use magic (see next section) and make comments in special cells that understand markdown and latex, making them a very powerful tool for documenting scientific programming. As an example of their usefulness these tutorials were also made in a Jupyter notebook. There are some drawbacks to this way of coding though, so don't forget to use the terminal and scripts when it's appropriate!

iPython magic

[ back to top ]

Magic is real, at least in Ipython and Jupyter notebooks. Magic commands start with a % and preform tasks outside of the scope of programming languages. For instance, at a certain you may find yourself thinking: wow, how am I going to remember all these commands I just entered in? Answer: with a little help from magic.

In [1]:
%history 1-10
%history 1-10

The %history magic will print all the commands in the specified range to the screen. To save this output to a file we should use %save with the file to save to and the line numbers we want to save as arguments.

In [2]:
%save Python_Intro_History.txt 1-10
The following commands were written to file `Python_Intro_History.txt.py`:
get_ipython().magic(u'history 1-10')
get_ipython().magic(u'save Python_Intro_History.txt 1-10')

To see the syntax of a magic command you should use %command? where command is the magic you are interested in. Check out more magic commands on this Ipython quick reference.

Introduction to Python

File input and output

[ back to top ]

The process of getting information from and putting it into a file is very important. What we'll cover here should only be used if you are parsing plain text, not a csv or other specialized format. There are much better ways to access formatted data and we will touch on those as they come up. But as an example, let's open a file, print out the contents line by line and then add text to the file.

We'll be working on the file named people.txt that you created in the section on the command line shell. If you didn't do that exercise, you can also download it here. Then, using your favorite text editor, open an empty file and save it as simplefilereadwrite.py. This will be your first Python script!

To open a file for writing use the built-in open() function. open() returns a file object, and is most commonly used with two arguments. The syntax is:

file_object = open(filename, mode)

where file_object is the Python variable to put the file object. The second argument describes the way in which the file will be used. The mode argument is optional; 'r' will be assumed if it’s omitted.

File open modes

'r' when the file will only be read

'w' for only writing (an existing file with the same name will be erased)

'a' opens the file for appending; any data written to the file is automatically added to the end.

'r+' opens the file for both reading and writing.

Let's start typing in the Python script simplefilereadwrite.py (more information on writing Python scripts will follow later in the tutorials):

In [3]:
#!/usr/bin/env python

f=open("people.txt",'r+')
print(f.read())
James 24
Jane 40
Sam 12
Ben 1
Debbie 20
Peggy 30
Chuck 67
Mary 8
Buck 30
Burt 100

The location of people.txt may be different for you, if it is not in your current directory you should give the path to it instead of just the file name. The f.read() command returns a string containing all characters in the file. Run this little script by entering on the command line:

python simplefilereadwrite.py

This should result in the same output as shown above. However, we said we wanted to do this line by line so we actually need a different way of doing this. When the file object is created by python the contents line by line can be iterated over with a for loop (more information on for loops will follow later in the tutorials). Thus we can adapt the script simplefilereadwrite.py to the following.

In [4]:
#!/usr/bin/env python

f=open("people.txt",'r+')

for line in f:
    print(line)
James 24

Jane 40

Sam 12

Ben 1

Debbie 20

Peggy 30

Chuck 67

Mary 8

Buck 30

Burt 100

The extra return between each line occurs because the print function and the file both produce a return giving us two. There are a number of ways to get rid of this effect, but I know that the hidden characters in the text file making this happen is the last one so we can get the results we want by just taking all the characters but the last one, recall that strings are actually lists (more information on lists will follow later in the tutorials).

In [5]:
#!/usr/bin/env python

f=open("people.txt",'r+')

for line in f:
    print(line[:-1])
James 24
Jane 40
Sam 12
Ben 1
Debbie 20
Peggy 30
Chuck 67
Mary 8
Buck 30
Burt 10

We also want to append a line to the end of the file so let's learn how to do this and then add in the argparser stuff. The write method takes one parameter, which is the string to be written. To start a new line after writing the data, add a \n character to the end. When we are done with a file we should close it.

In [6]:
f.write("Kevin 21\n")
f.close()