An instant primer on using Python to plot data                      Humphrey

On the lab computers, there is a nicely packaged ‘distribution’ of Python called ANACONDA (It is free to download on your own machines, just search on the web for ANACONDA python).  A ‘distribution’ is just a convenient bundle of related programs, and you can download this same package freely onto your own computer.  We will mainly be using the Graphical User Interface (GUI) called Spyder.  So, open the ANACONDA directory and double-click on Spyder.   You will (eventually) get a window with several sub-windows (which you can detach and move around etc, have fun). SPYDER has a highly modifiable interface with numerous possible open views or windows, thus it may look quite different from somebody else’s screen!  SPYDER will open in one or more windows, (they are detachable from the main frame, and thus may appear anywhere on the desktop). To start with we will only use the IPython ‘Console’.  (the I stands for IDLE, which should mean something to Monty Python fans).  The other 2 windows that we will eventually use a lot is the ‘editor’ and the ‘variable explorer’.

Minimally the ‘Console’ window allows you to directly type python commands and see the results, while the ‘editor’ window allows you to write and edit and ‘run’ simple or complex python programs.

To get started, we will use the Console. Things you type are in bold below.  Asides or notes are in red. Locate the main ‘Console’ and type into the empty space, and after the ‘In [1]:’ prompt ( the number in the [ ] is the prompt index number).

The below illustrates using Python as a calculator. 

1. For the newbie, try the below:

Try typing

2*3

python will answer with

OUT [1]: 6

next, try

x = 3*4.2

Python will answer with

IN[2]: blank!

(nothing!!, don’t panic, just type x)

x

Now python says:

Out[3]: 12.600000000000001

 

The reason python didn’t reply to the x=3*4.2 is that the console assumes you know what you are doing(!).  In the first example, you multiplied 2 numbers, but didn’t put the result anywhere, so the console showed you the result.  While in the 2nd case, the console assumed you wanted to put the result in x, so it did that.  You can use any basic math operators  that you want.  The common ones are of course + - * /.  Power is **, and // means integer divide.  Look online for help (there is far too much!).

3 important notes:

1- A subtle, but important point: the ‘=’ sign is different in computer work than you may be used to from Math.  It is a right to left assignment statement.  In other words, ‘=’ in python uses the left hand side as a named place to store the right hand side.  So in python the ‘=’ sign is NOT symmetric: x=y puts whatever is ‘y’ into the named location ‘x’.  And x can contain anything, so x = “myfoot” puts the string “myfoot” into a location called ‘x’. (‘x’ is then referred to as a ‘variable’, which now happens to contain a string). The other type of ‘equal’ in numerical work is asking the question; does x equal y?  This is written x==y, and evaluates to either TRUE or FALSE (FALSE = 0, TRUE is anything else, normally 1)

2-Errors messages can be obscure, but typically begin with the line: “Traceback (most recent call last):” After a while you will be able to (usually) figure out what Python doesn’t like, but for now it is mainly telling you that you have an error.

3-typing ‘?’ after a variable name (or most objects) will bring up a ‘help’ string (usually a lot longer than you wanted)

 

A couple of more notes might help.

1) any ‘variable’ (eg. a variable like x or y or myLeft_foot or whatever) retains its last value.

In this example x retains its value from above and can be used again as in:

y = x + 3.2

Python will type the anwser, if you type y: 

y

A more elegant way of getting the value of y, would be to use the print() function:

print(y)

returns the same. But print() allows you to format your answer:

print('{0:6.2f}'.format(y))

I won’t try to explain the ‘format’ here, but will in class, suffice to say you can do anything you want to make the output fancy. You should play with the calculator (Ipython console) to see that it does that you expect, and what it doesn’t.

Moving on from a calculator: A fundamental aspect of Python is that it can be written as a procedural language, as in our calculator work above. (Where ‘procedural’ means that we explicitly write exactly what when and how the computer moves data around.) However Python is really an object oriented language, and you need to understand the basics of that concept. Everything in Python is an object, and objects can (and usually do) have ‘methods’ that work with these objects. We actually use this concept in algebra a lot. A number can be added, multiplied etc, and these are ‘methods’ of all numbers. A string of letters can be turned to upper case or made italic, or joined to another string; these are string methods. A number can’t be turned to upper case, and a string can’t have its square root taken. Thus we unconsciously recognize that there are certain methods that work with various objects. Python formalizes this concept with the widely used ‘.’ notation.

As an example, using the string x from the above discussion: (you can type this in the console)

x = “myfoot

x?                   (see what you get?)

y = x.upper()

print(y)

or

print(x.upper())

uses the ‘upper’ method of the ‘string’ class of variables, to turn the string to upper case. ‘upper’ only works on strings, it is a string method. Note the ‘()’ are there because many methods take arguments or modifiers.

Storing more than one thing into a location (such as a huge matrix): Single numbers can be stored into a named variable.  If we want to store multiple numbers, such as a vector or a matrix we will mainly use a special array variable which we will discuss later. However basic Python has a storage variable called a list that can be used to store a list of numbers (or anything such as strings or letters or whatever).  We will use lists later, but I introduce them here to introduce ‘indexing’ or addressing the elements of a list.  This information is crucial for addressing more complex arrays, since by far the most common numerical modelling errors are indexing errors.

A list is created by enclosing the list in square brackets and separating with commas:

n = [2,3,4,5]

makes a list. Now the elements of n can be addresses using the index (number location, remembering that indexes in python start at 0!)

n[2]

will give the result ‘4’.  There are lots of fancy ways of addressing list elements, for example you can address backwards from the end, so that n[-1] results in ‘5’.  We will also use ‘slicing’ a lot.  A slice of a list is a section of the list, so n[0:2] results in another list [2,3].  Note when slicing the end of the slice does NOT include the last index.

Finally before we move on to scientific python. There are some syntax aspects of python. Instructions are generally written one per line, and there is no line termination character. A ‘#’ indicates that the rest of the line is a comment not a program line. The most important syntax however is indentation. We haven’t looked at loops etc, but blocks of code inside loops are indented. The convention is that they are indented 4 spaces. The most frequent coding error in python is indentation errors, and probably the most frustrating is if you happen to use a tab instead of 4 spaces… it looks the same but causes your program to crash!

To illustrate the first iterative ‘loop’, consider the following code snippet which makes a short list and then loops over the list using a ‘for’ loop. A for loop executes the code inside (indented code) the loop each time thru the loop. The loop variable ‘j’ takes a new value from the list each time thru, and it this case prints the square of j.  Note the necessary syntax: there must be a ‘:’ after the ‘for’ statement and the loop must be indented.  Anything not indented signals the end of the loop.

n = [1,2,3,4,5]

for j in n:

    print(j*j)    #print the squares of the list n

print(‘Done’)

# the end

Programming:  Although the console is very useful we need to learn how to program. The other important window in Python is the editor. Anything you type into the editor can be saved and then ‘run’ as a program (assuming it is valid python code). The output will appear in the console (including error messages). Your code will be saved as ‘filename.py’ (you get to chose the name as long as it ends in .py). At this point you should make a dedicated folder for the programs you are going to use in this class.

So, click for a ‘new’ program in the ‘file’ menu above the editor and copy and paste the above code into the “Editor” window (big left hand window). 

Nothing will happen until you click on the green arrow (‘run’) on the top bar. It may work, and the results will appear in the Console!  Success! Or you may get a ‘traceback..’ error, in which case there is probably an indentation error in the pasting. We will talk about that.  Or (depending on how your Spyder is set up) you may be asked 2 things: the first is a big list of check boxes...just click ok and move on, and you may be asked where to store your program.  Even if you are not asked where to store this great program, you should save it just to see how it is done. You probably want to make a folder called something like ‘PythonPrograms’.  And you probably want to either keep that folder on a thumb drive or on your own laptop.

Success?  (note in the console, you can use a lot of simple LINUX commands to navigate, such as ‘pwd’ to find what folder you are in, and ‘ls’ to see the files in the folder).

As a final step, you should be able to delete your program from the editor and then go to the ‘file’ menu and ‘open’ it.

There! You have now written, debugged, run, saved, and loaded a python numerical program. 

 

Moving on to more scientific python

Pure python is somewhat restricted, for example it doesn’t have built-in Stats routines, or Matrix routines, efficient floating point calculations, or even methods to access the operating system.  It is designed to be a basic framework, to which you add modules that add functionality.  The advantage of this is that the basic language is simple, the disadvantage is that you can’t do much ‘scientific’ with it! To solve this, there are 2 add-on modules that form the basics of ‘scientific python’ (they are actually a part of an even larger set of libraries called SciPy). The 2 modules we will use a lot in this class are ‘numpy’ and ‘matplotlib’. Numpy adds a host of mathematical manipulations, including a vast collection of linear algebra operations and various equation solvers.  Numpy’s biggest feature is that it adds efficient multidimensional numerical arrays to python. Matplotlib adds the ability to make publication quality plots, using a syntax similar to MATLAB’s plotting routines.

We calculate the value of the constant pi to illustrate basic programming: this shows a complete python program, it also introduces the usage of numpy.  The typical layout of a small program is to begin with some descriptive comments, then list the extra ‘imports’ for the program, (this is also where our own functions [to be discussed later] go), then a header section containing user defined values used in the program, and finally the actual commented code.

# using the (very inefficient) Euler series for estimating PI

# PI**2 /6 = 1+1/2^2 + 1/3^2 + 1/4^2 ....

# by convention, most imports of modules should be at the beginning of a program

 

import numpy as np          # this allows us to use any of the modules in numpy using the shorthand ‘np’, and we can call the using the ‘method’ technique:  np.methodname()

# also good programming practices puts any variables that the user might want to change as a header section

terms = 20                         # how many terms of the series are we going to use?

ourpi = 1                            # ourpi is the '~pi' given by increasing terms in Euler series, we start with the first term

 

for n in range(2,terms):      # example 'for' loop, the 'range()' function returns a list of numbers (start,stop)

    ourpi += 1/(n**2)            # each time through the loop, add an extra terms based on 'n' (which is incremented)

    print(n,np.sqrt(6*ourpi)) # 'print' is a complex function, but at its simplest it just prints readable numbers

print(np.sqrt(6*ourpi), "our estimate of pi after",n+1,"terms in series")  

print(np.pi,"an accurate value for pi")  # numpy has a built in value of pi called np.pi

The above code can be found here. 

This is the end of the basic primer.  Below are some code examples of: graphing or plotting, reading and writing files/data from the computer’s system, and an example of truncation and subtraction errors.

For the more advanced students to illustrate how the Greeks calculated pi (but didn’t have computers so couldn’t get this accuracy), this is heavily commented.  You will note some comments in the ‘for’ loop about a classic ‘subtraction’ error in numerics (which is further explored in the indicated additional code).

# calculating pi,

# To illustrate basic python loops,

# The greeks used inscribed and circumscribed polygons to estimate pi to a couple of decimal places

# With the computer, we can approximate pi by inscribing ever increasing polygons

# (the greeks had difficulty with the sheer number of calculations, we let the computer do it)

# the key, as Archimedes realized, was that using Pythagoras's theorm, he derived a recursive

# formula for the side length of a regular polygon, based on the side length of a n/2 polygon:

# Sn = sqrt( 1/2 - sqrt( (1/2)**2 - (Sm/2)**2 ) )

# where Sn is the side length of an n-sided polygon, and Sm is the side length of an (n/2) polygon

# this is a messy formula, which meant the greeks couldn't get much accuracy calculating by hand

# below we use this formula (slighly rearranged to reduce subtraction errors)

 

# import extra 'modules', this is typical if you want to do anything fancy

# we don't need it much here, but we will import 'numpy' as our basic fancy math resource

import numpy as np

 

# the initial part of any program usually is where you put any variables etc, that

# need to be set, or are likely to be changed by the user

# since we are going to iterate to find pi, set a max number of iterations

maxiter = 30

 

# start with an inscribed square (a 4-gon), since we know the edge length

si = np.sqrt(2)           # side of a inscribed square (si = n-gon) inside a radius 1 circle

 

print("n-gon: {0:10d} - {1:1.18f}".format(4,2*si) )              

for n in range(1,maxiter):  #

    t =  (si/2)**2                                                  # si here is the old side length

    si = ( np.sqrt( ((1-np.sqrt(1-t))**2 + t) ) )                   # si is the new side length

    print("n-gon: {0:10d} - {1:1.18f}".format(4*2**n,si*2**(n+1)) ) # multi by number of sides, 2**(n+1)

                                                                    # to get total circumference

   

# An interesting point is that tiny errors acumulate, resulting in errors at the

# 15 decimal place

pie = np.pi               # get an accurate (to at least 15 places) value for pi

print("Pythons default pi  {0:1.18f}".format(pie) )  # the numpy module has its own accurate value for pi

# major side note, python's value of pi has an error in the 16th place,

# this is due to storing np.pi in a single precision storage location

 

# print("Exact value pi      {0:1.18f}".format(3.141592653589793238) )  # doesn't work

print("Exact value pi      {0:1.15f}{1:3d}".format(3.141592653589793,238) )

# python defaults to its 'single precision' which is valid to about 14 or 15 significant

# digits. If you really need more you need to import multiprecison "mpmath" module

# otherwise variables are stored in containers that truncate

This code is located here.  If you are interested, here is some code that explores the problems associated with truncation errors and subtraction errors.)

 

A hugely important part of modelling is getting fancy graphic output to impress your friends (and make your scientific papers look good!). We will use matplotlib which can produce publication quality plotting.  To get going on actually using python, we will write a program to plot some data.  Actually we will just plot a sine wave. The following code will be explained in class. It can be run in the ‘editor’ window. The three code snippets below can be found in copyable form here a b c

I find it useful to change the default of SPYDER to get a separate plot (not one that appears in the ipython window).  Go to ‘tools/preferences/ipython console/graphics’ and change ‘inline’ to ‘automatic’.   Separated plots can be resized and ‘zoomed’ etc, which makes them more useful.

import numpy as np

import matplotlib.pyplot as plt

# this program is much more 'procedural' than good python can be, but we will learn as we go along

n = 100 # the number of points we want to plot

 

# the following 'loop' is a VERY poor way of making an array or list, but it is easy to understand at this point in the class

x = [] # create an empty list

y = []

for z in np.linspace(0,7,n): # ‘for loops’ cycle over a list, in this case a list of numbers generated by the numpylinspace’ method, don’t forget the ‘:’

    y.append(np.sin(z))

    x.append(z) #create lists of x and y. Lists must be 'appended' to to extend

 

fig = plt.figure()

ax = fig.add_subplot(111)

ax.plot(x,y,'r-*')

# could stop here, but can pretty it up a little

xspan = max(x) -min(x)

yspan = max(y) -min(y)

minx = min(x) - 0.05*xspan

maxx = max(x) + 0.05*xspan

miny = min(y) - 0.05*yspan

maxy = max(y) + 0.05*yspan

ax.axis([minx,maxx,miny,maxy])

ax.set_title("Simple Sine plot")

ax.set_xlabel("Radians")

 

plt.show()

 

Finally, most of our work will be wrting code in the editor window and ‘running’ it. But we can also interact with the computer directly to save or load data, or to run a PYTHON code that exists outside of our console. Typically to run a program from outside we open it using the ‘file’ menu, and it is loaded into the editor and can be ‘run’. To illustrate some of of these ideas, we can make the sine wave data creation into a script in the editor as shown below. When ‘run’ it will write the sine wave data to a file on the computer. And in ‘running’ the program we will be asked to save it as a script with a name so we can open it and run it again.

 

import numpy as np                                                         # import the numpy module so we can do some real math, such as taking sine(x), this type of import allows us to call numpy ‘np’

try:                                                                                   # one of the many elegant python features is the TRY block, which doesn’t crash the program if the program makes an error!

    savefilename = "sinedata.txt"                                     # make a string variable

    outfile = open(savefilename,'w')                                # use a built-in function to open a file for writing (create if necessary)

    for x in np.linspace(0,7,100):                                     # ‘for loops’ cycle over a list, in this case a list of numbers generated by the numpylinspace’ method, don’t forget the ‘:’

        outfile.write(str(x) + " " + str(np.sin(x)) + "\n")    # this string concatenation will be explained in class.  Note ‘write’ is a method of the ‘file’ class.

    outfile.close()                                                           # it is always good to clean things up (close the file), and to print out a few messages to tell us that things worked

    print(savefilename + “ created")

except:

    print("Failure!")

 

Now that we have data file, we can write a routine to plot the data, note this is a useful little program to keep around to plot simple data, and to copy from when you need a plot in your program. (this next concept won’t be used in this class but introduces some very useful ideas) This last bit of code below is basically the plotting code from above, but it is written as a standalone script that can be run outside of the ANACONDA environment. To use it, you could save it as something like ‘myplot.py’. Then in the console, to run it, you would type ‘ %run myplot.py sinedata.txt ‘, or you can run it as a standalone program on your PC. (I will talk about this in class.)

 

#!/usr/bin/env python

"""

Simple plotter to show an XY plot of a file with at least two cols of data

Plots the first two cols

"""

 

import sys

from numpy import loadtxt              # pandas read.csv seems to the most elegant method, but this is a simple ascii file reader

import matplotlib.pyplot as plt

 

if (len(sys.argv) > 1):

    infile = sys.argv[1]

    x,y = loadtxt(infile,unpack=True,usecols=(0,1))

    fig = plt.figure()

    ax = fig.add_subplot(111)

    ax.plot(x,y)

    # could stop here, but can pretty it up a little

    xspan = max(x) -min(x)

    yspan = max(y) -min(y)

    minx = min(x) - 0.05*xspan

    maxx = max(x) + 0.05*xspan

    miny = min(y) - 0.05*yspan

    maxy = max(y) + 0.05*yspan

    ax.axis([minx,maxx,miny,maxy])

    if (len(sys.argv) > 2) :

        ax.set_title(sys.argv[2])

    plt.show()

else:

    print("Syntax: xyploter <infile> [title]")