Accessing functions beyond the built-in functions

[ back to top ]

If we want to use libraries and modules not defined within the built-in functionality of python we have to import them. There are a number of ways to do this.

In [1]:
import numpy,scipy

Imports the module numpy and the module scipy, and creates a reference to that modules in the current namespace. Or in other words, after you’ve run this statement, you can use numpy.name and scipy.name to refer to things defined in module numpy and scipy respectively.

In [2]:
from numpy import * 

Imports the module numpy, and creates references in the current namespace to all public objects defined by that module (that is, everything that doesn’t have a name starting with “_”). Or in other words, after you’ve run this statement, you can simply use a plain name to refer to things defined in module numpy. Here, numpy itself is not defined, so numpy.name doesn’t work. If name was already defined, it is replaced by the new version. Also, if name in numpy is changed to point to some other object, your module won’t notice.

In [3]:
from numpy import array, mean, std

Imports the module numpy, and creates references in the current namespace to the given objects. Or in other words, you can now use array and mean and std in your program.

More information on the numpy package will follow later in the tutorials.

Making and running python scripts

[ back to top ]

Thus far we have been entering lines into the Ipython terminal or in a Jupter notebook one or a few lines at a time. This is a great way to prototype an idea you have and want to implement but no way to write a full program! For writing programs we turn to writing scripts and for that we need to exit Ipython and open your favorite text editor. To exit Ipython you can enter the command exit() into the terminal and enter y if prompted.

When writing scripts, one should always keep coding style and coding conventions in mind. More information on coding style and conventions for Python (PEP) can be found in the tutorial section on Coding Style.

Let's move on to writing our first python script and then talk more about things we can use in python. For our first program let's keep it simple and use some ideas we already have worked with. How about we find all the prime numbers up to 100? To do this we need to tests if each number between 2 and 100 is divisible by any number smaller then itself. We can do this pretty easily with a list, a few loops, and an if statement.

In [4]:
#!/usr/bin/env python

numbers0to100=range(2,101)
for num in numbers0to100:
    prime=True
    for chknum in range(2,num):
        if num%chknum==0:
            prime=False
    if prime==True:
        print(num,"is prime")
(2, 'is prime')
(3, 'is prime')
(5, 'is prime')
(7, 'is prime')
(11, 'is prime')
(13, 'is prime')
(17, 'is prime')
(19, 'is prime')
(23, 'is prime')
(29, 'is prime')
(31, 'is prime')
(37, 'is prime')
(41, 'is prime')
(43, 'is prime')
(47, 'is prime')
(53, 'is prime')
(59, 'is prime')
(61, 'is prime')
(67, 'is prime')
(71, 'is prime')
(73, 'is prime')
(79, 'is prime')
(83, 'is prime')
(89, 'is prime')
(97, 'is prime')

We can copy this script into our text editor and then save it with a descriptive name in a directory we will use for these script examples. Let's name it primenumbers.py, where the .py extension identifies what kind of file it is. Now we have a script and need to run it.

Running a python script

There are two main ways to run a python script. Firstly, we can give the path to our script as the argument of the python command on the command line in a shell by typing

python primenumbers.py

Secondly, we can include the so-called shebang line

!/usr/bin/env python

as the first line of our script, which if the file is executable will figure out which python on your system to use and then run the script by typing on the command line

./primenumbers.py

Both methods will produce the same result so it's really just a matter of preference. Let's run our program by the first method for now by making sure our shell is in the same directory as our script the then running

python primenumbers.py.

You should see the same result as when running it in the Jupyter notebook.

Getting command line arguments

One thing that we do not want to do with a script is be frequently editing and re-saving the file to alter parameters and files which it is reading or writing to. For things like this we want to specify arguments after the script name on the command line. This functionality is not built into python natively so we need to import a different module to do this. For simple things we will use the sys module, for more complex things we will use the argparse module. Let's start with a simple example and discuss what it does and how to run it. Save the following code as simpleargv.py in your running directory.

In [5]:
#!/usr/bin/env python

import sys

print('Number of arguments:', len(sys.argv), 'arguments.')
print('Argument List:', sys.argv)
('Number of arguments:', 3, 'arguments.')
('Argument List:', ['/Library/Python/2.7/site-packages/ipykernel_launcher.py', '-f', '/Users/daanvaneijk/Library/Jupyter/runtime/kernel-6f54918b-eaa6-470d-9e60-609a537606fe.json'])

sys takes every word in your shell command after python (if you're using that invocation) and places it in a list whose elements were seperated by spaces in the command. This list is called sys.argv. Thus the length of sys.argv is the number of arguments you have plus 1 (the name of the script is also put in) and you can get the arguments you are feeding into the program by indexing into the list. Try running the program in the shell as follows and see if the output makes sense.

python simpleargv.py one two three

For more advanced input we might want to allow the program to be more robust then just taking a list of words from the command line. This is where argparse comes in. With argparse we can designate specific options to be valid and set variables in our script based on the options which are specified. Lets say that we would like our python script to support a --name option. We could define this alone but we could also allow --name to have a shorter sibling –n. –-name and -n should have the same meaning, but one is shorter and the other longer and more verbose. To do this we again will look to a simple example and discuss how to run it. You can save the following code to your running directory and name it simpleargparse.py.

In [6]:
#!/usr/bin/env python

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-n', '--name')
parser.add_argument('-v', '--verbose')

args = parser.parse_args()

print(args.name,args.verbose)
usage: ipykernel_launcher.py [-h] [-n NAME] [-v VERBOSE]
ipykernel_launcher.py: error: unrecognized arguments: -f /Users/daanvaneijk/Library/Jupyter/runtime/kernel-6f54918b-eaa6-470d-9e60-609a537606fe.json
An exception has occurred, use %tb to see the full traceback.

SystemExit: 2
/Library/Python/2.7/site-packages/IPython/core/interactiveshell.py:2890: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)

After the shebang we import the module argparse and then instantiate a ArgumentParser instance in our variable parser. Then we add a new option and parse the arguments. Finally we invoke and print the stored options by calling args.name and args.verbose. To run this program we could use

python simpleoptparse.py -n James -v True

which should just result in James and True being printed.

Let's say we want our variable to be stored to a name we choose ourselves, define a default value, and also define beforehand what kind of variable we want the incoming variables to be. To do that, we have to add the dest, type, and default options respectively to our add_argument lines.

In [7]:
#!/usr/bin/env python

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-n','--name',dest='my_name',type=str,default='KillRoy')
parser.add_argument('-v','--verbose',dest='v',type=bool,default=False)

args = parser.parse_args()

print(args.my_name,args.v)
usage: ipykernel_launcher.py [-h] [-n MY_NAME] [-v V]
ipykernel_launcher.py: error: unrecognized arguments: -f /Users/daanvaneijk/Library/Jupyter/runtime/kernel-6f54918b-eaa6-470d-9e60-609a537606fe.json
An exception has occurred, use %tb to see the full traceback.

SystemExit: 2

If we run the program with the same options as before the results will be the same. However we don't have to specify them, and the result will be the defaults being used.

Another thing that argparse is great for is documentation. By running the program and specifying the -h option we can see the help information specified in the add_argument lines and a program banner specified in the ArgumentParser.

In [166]:
#!/usr/bin/env python

import argparse
desc="Just a test program to demonstrate some argparse basics"
parser = argparse.ArgumentParser(description=desc)
parser.add_argument('-n','--name',dest='my_name',
                    type=str,default='KillRoy',
                    help="Give your name so I can print it.")
parser.add_argument('-v','--verbose',dest='v',
                    type=bool,default=False,
                    help="Do you want me to be verbose?")

args = parser.parse_args()

print(args.my_name,args.v)
usage: ipykernel_launcher.py [-h] [-n MY_NAME] [-v V]
ipykernel_launcher.py: error: unrecognized arguments: -f /Users/daanvaneijk/Library/Jupyter/runtime/kernel-250fc852-d38b-4009-9d48-ae9cebaebb2a.json
An exception has occurred, use %tb to see the full traceback.

SystemExit: 2

Now running the program with the -h option gives you a description of the program and the options it uses.

python simpleargparse.py -h usage: simpleargparse.py [-h] [-n MY_NAME] [-v V] Just a test program to demonstrate some argparse basics optional arguments: -h, --help show this help message and exit -n MY_NAME, --name MY_NAME Give your name so I can print it. -v V, --verbose V Do you want me to be verbose?

The ArgumentParser adds a lot of great functionality and all the options can be found at the Python documentation pages online. In addition, have a look here for a tutorial on argparse in Python 2.

Coding Style

[ back to top ]

Before we get too far along for anyone to develop bad habits let's stop and talk about some style guidelines you should follow as they are layed out in the Python Enhancement Proposal (PEP).

Indentation

Python is different then other languages in that it uses white space to determine what control statement it is inside of. Use 4 spaces for indentation. This is enough space to give your code some visual structure, while leaving room for multiple indentation levels. When using text editors to write scripts most times you can change your settings so that a tab inserts 4 spaces.

Line Length

Use up to 79 characters per line of code, and 72 characters for comments. This is a style guideline that some people adhere to and others completely ignore. This used to relate to a limit on the display size of most monitors. Now almost every monitor is capable of showing much more than 80 characters per line. But we often work in terminals, which are not always high-resolution. We also like to have multiple code files open, next to each other. It turns out this is still a useful guideline to follow in most cases. When using functions whose arguments get long one can simply break after the completion of an argument with a return, python will keep reading in the arguments until it finds a matching ')'. If a string or equation runs too long a backslash ('\') can be used to break it into more then one line.

Blank Lines

Use single blank lines to break up your code into meaningful blocks.

Comments

Use a single space after the pound sign (#) at the beginning of a line. If you are writing more than one paragraph, use an empty line with a pound sign between paragraphs.

Naming Variables

Name variables and program files using only lowercase letters, underscores, and numbers. Python won't complain or throw errors if you use capitalization, but you will mislead other programmers if you use capital letters in variables at this point.

A word of caution

[ back to top ]

  • Python is designed to be readable and easy to use. Optimal speed performance and memory allocation are not it's forte. If you are going to be using large arrays or preforming calculations that need to be fast, python is probably not what you want. Use python for prototyping and then port to C++ if you want better performance.
  • Ipython notebooks are great for quick calculations but not for long running jobs. If you lose connection to the notebook server, you lose all the work you have preformed. Prototype in the notebook and then move to a script which you can run safely on your machine, the cobalts (you will understand what that means after bootcamp :)), or a cluster.

Additional Python Packages

NumPy

[ back to top ]

The NumPy (Numeric Python) package provides efficient routines for manipulating large arrays and matrices of numeric data. It contains among other things:

  • A powerful N-dimensional array object (numpy.ndarray)
  • Broadcasting functions
  • Useful linear algebra, Fourier transform, and random number capabilities

By convention, NumPy is usually imported via

In [119]:
import numpy as np

The fundamental datastructure that NumPy gives us the the ndarray (usually just called "array"). According to the NumPy documentation

An array object represents a multidimensional, homogeneous array of fixed-size items. An associated data-type object describes the format of each element in the array (its byte-order, how many bytes it occupies in memory, whether it is an integer, a floating point number, or something else, etc.)

Generally, think of an array as an (efficient) Python list with additional functionality. BUT keep in mind that there are a few important differences to be aware of. For instance, array object are homogenous—the values in an array must all be of the same type. Because arrays are stored in an unbroken block of memory, they need to be fixed size. While NumPy does support appending to arrays, this can become problematic for very large arrays.

In [120]:
array = np.array([1, 2, 3, 4, 5, 6])
print(array)
print(type(array))
[1 2 3 4 5 6]
<type 'numpy.ndarray'>

The datatype of an array can be found using the dtype array attribute

In [121]:
print(array.dtype)
int64

If not specified, NumPy will try to determine what dtype you wanted, based on the context. However, you can also manually specify the dtype yourself.

In [122]:
array = np.array([1, 2, 3, 4, 5, 6], dtype=float)
print(array)
print(array.dtype)
[1. 2. 3. 4. 5. 6.]
float64

Array attributes reflect information that is intrinsic to the array itself. For example, it's shape, the number of items in the array, or (as we've already seen) the item data types

In [123]:
array = np.array([[1, 2, 3],[4, 5, 6]], dtype=float)
print(array)
[[1. 2. 3.]
 [4. 5. 6.]]
In [124]:
print(array.shape)
print(array.size)
print(array.dtype)
(2, 3)
6
float64

In addition to array attributes, ndarrays also have many methods that can be used to operate on an array.

In [125]:
print(array.sum()) # Sum of the values in the array
print(array.min()) # Minimum value in the array
print(array.max()) # Maximum value in the array
print(array.mean()) # Mean of the values in the array
print(array.cumsum()) # Cumulative sum at each index in the array
print(array.std()) # Standard deviation of the values in array
21.0
1.0
6.0
3.5
[ 1.  3.  6. 10. 15. 21.]
1.707825127659933

NumPy naturally supports various matrix operations:

In [126]:
M = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
print(M)
[[1 2 3]
 [4 5 6]
 [7 8 9]]
In [127]:
M.T
Out[127]:
array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])
In [128]:
M.diagonal()
Out[128]:
array([1, 5, 9])
In [129]:
M.dot([1, 2, 3])
Out[129]:
array([14, 32, 50])
In [130]:
M.trace()
Out[130]:
15

To learn more about the motivation and need for something like Numpy, check out this great blog post Why Python is Slow: Looking Under the Hood.

In [131]:
array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])
print(array1 + array2)
[ 6  8 10 12]
In [132]:
2*array1
Out[132]:
array([2, 4, 6, 8])
In [133]:
array1**2
Out[133]:
array([ 1,  4,  9, 16])

Let's get an idea of how much NumPy speeds things up.

In [136]:
from IPython.display import Image
Image('img/squares-list-creation.png')
Out[136]:
In [137]:
Image('img/sum-range.png')
Out[137]:

NumPy Exercises

[ back to top ]

  1. Find the dot product of these matrices [[1,2,3],[4,5,6]] [[7,8],[9,10],[11,12]]
    • Answer is [[58,64],[139,154]]
  2. Find the cross product of the first matrix from 4 with [[7,7,7],[7,7,7]]
    • Answer is [[-7,14,-7],[-7,14,-7]]
  3. Find the Eigen values of [[1,2,3],[4,5,6],[7,8,9]]
    • Answer is [1.61168440e+01,-1.11684397e+00,-1.30367773e-15]