Open In Colab

Line Graphs and Scatter Plots

Short look at line graphs

In [ ]:
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

Let's consider the following line graph.

In [ ]:
plt.plot([3, 5, 2], [2, 0, 8], "^k:")
Out[ ]:
[<matplotlib.lines.Line2D at 0x7fe29d28f358>]

Let's explore what the individual components actually do:

  • plt.plot() generates the basic graph as we have seen before.
  • [3,5,2] represent the x-parameters whereas [2,0,8] represents the y-parameters of our graph. This leads to the following three x:y coordinates 3:2, 5:0, and 2:8. The graph then draws a line between the points in the order they have been specified.
  • "^k:" is again a format string (fmt), which consists of the following parts: fmt = '[marker][line][color]'. In our case the "^" stands for a triangle marker, the "k" specifies that the markers and their connecting lines are black and the ":" transform the line into a dotted line.

As we can see, documentation is key for understanding how plots work!

Exercise For You
Again try to guess which plot the following code returns. Use the empty cell below to verify your result.

plt.plot([2,3,4,5],[5,4,3,2], "Dg-.")

Hint: Use the documentation (plt.plot?) to find the meaning of the format string "Dg-.".

In [ ]:
#run to access documentation
plt.plot?
In [ ]:
# test your answer

Scatter Plots

Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead of points being joined by line segments, here the points are represented individually with a dot, circle, or other shape. The plot used in our iris dataset example above was a scatter plot.

Scatter Plots using plt.plot()

Technically we can use the standart .plot() operator in connection with the "o" format string (does not draw a line between points) to define a scatter plot. This would look as follows:

In [ ]:
plt.plot([2,3,4,5],[5,4,3,2], "o")
Out[ ]:
[<matplotlib.lines.Line2D at 0x7fa9176927d0>]

Scatter Plots using plt.scatter()

A second, more powerful method of creating scatter plots is the plt.scatter function, which can be used very similarly to the plt.plot function:

In [ ]:
plt.scatter([2,3,4,5],[5,4,3,2])
Out[ ]:
<matplotlib.collections.PathCollection at 0x7fa9175eb790>

The primary difference of plt.scatter from plt.plot is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.

Let's show this by creating a random scatter plot with points of many colors and sizes.

In [ ]:
x = [1,2,3,4,5,6,7,8,9]
y= [1,2,3,4,5,4,3,2,1]

colors = ["green", "blue", "red", "yellow", "cyan", "black", "black", "white", "green"]
size = [100,150,200,250,300,250,200,150,100]

# creates a plot with points at x:y coordinates that have size s and colors c
plt.scatter(x, y, s=size, c=colors)
Out[ ]:
<matplotlib.collections.PathCollection at 0x7fa91717ef10>

Exercise For You
Try to guess the output of the following code snippet.

x = [1,2,3,4,5,6,7,8]
y = x

colors = ['green']*(len(x)-1)
colors.append('blue')

plt.scatter(x, y, s=100, c=colors)
In [ ]:
#Test your answer here

Exercise For You
Try to guess the output of the following code snippet.

x = [1,2,3,4,5,6,7,8]
y = x
colors = ["orange" if x % 2 == 0 else "black" for x in range(len(x))]

plt.scatter(x, y, s=100, c=colors)
In [ ]:
# Test your answer here

Adding labels

To make your graph more meaningful, it is often necessary to add labels that describe what the hack is shown.

Adding labels is rather easy. For labels displayed at the x-axis just use plt.xlabel, for labels displayed at the y-axis use plt.ylabels. We can also give our plot a title using plt.title.

In [ ]:
#prepare some dummy data
x = [1,2,3,4,5,6]
y = x
plt.scatter(x[:2], y[:2], s=100, c='red')
plt.scatter(x[2:], y[2:], s=100, c='blue')


# add a label to the x axis
plt.xlabel('The number of times the student read this notebook')
# add a label to the y axis
plt.ylabel('The grade of the student')
# add a title
plt.title('Relationship between excercise and grades')
Out[ ]:
Text(0.5, 1.0, 'Relationship between excercise and grades')

You can furthermore use the argument labelpad (for ylabel/xlabel) or pad (for title) to determine the distance between the label and our graph.

In [ ]:
#prepare some dummy data
x = [1,2,3,4,5,6]
y = x
plt.scatter(x[:2], y[:2], s=100, c='red')
plt.scatter(x[2:], y[2:], s=100, c='blue')


# add labelpad 
plt.xlabel('The number of times the student read this notebook', labelpad=10 )
# we can also enter negative values for labelpad
plt.ylabel('The grade of the student', labelpad=-35)
# add a title
plt.title('Relationship between excercise and grades', pad=10)
Out[ ]:
Text(0.5, 1.0, 'Relationship between excercise and grades')

Adding a legend

While our graph looks better now, the difference between red and blue points seem to be unclear. Additional information like this can be displayed using a legend. We can add the legend elements using the "label" parameter within the scatter plot. After this we also have to call the legend using plt.legend()

In [ ]:
x = [1,2,3,4,5,6]
y = x
plt.scatter(x[:2], y[:2], s=100, c='red', label= "Group A")
plt.scatter(x[2:], y[2:], s=100, c='blue',label = "Group B" )


# add a label to the x axis
plt.xlabel('The number of times the student read this notebook')
# add a label to the y axis
plt.ylabel('The grade of the student')
# add a title
plt.title('Relationship between excercise and grades')
plt.legend()
Out[ ]:
<matplotlib.legend.Legend at 0x7fa916fd81d0>

Better, but not quite there yet. Let's customize our legend as follows:

In [ ]:
x = [1,2,3,4,5,6]
y = x
plt.scatter(x[:2], y[:2], s=100, c='red', label= "Group A")
plt.scatter(x[2:], y[2:], s=100, c='blue',label = "Group B" )


# add a label to the x axis
plt.xlabel('The number of times the student read this notebook')
# add a label to the y axis
plt.ylabel('The grade of the student')
# add a title
plt.title('Relationship between excercise and grades')
plt.legend(loc="upper left", frameon=True, title='Legend')
Out[ ]:
<matplotlib.legend.Legend at 0x7fa916fbe390>
In [ ]:
#feel free to check the documentation to see which further parameters for our legend exist
plt.legend?

Line Plot with multiple lines

There are various ways to plot multiple sets of data. The plt.plot() documentation gives us the following three possibilities:

  • The most straight forward way is just to call plot multiple times.
    Example:

    plot(x1, y1, 'bo')  
    plot(x2, y2, 'go')  
  • Alternatively, if your data is already a 2d array, you can pass it
    directly to x, y. A separate data set will be drawn for every
    column.

    Example: an array a where the first column represents the x
    values and the other columns are the y columns::

    plot(a[0], a[1:])  
  • The third way is to specify multiple sets of [x], y, [fmt]
    groups::

    plot(x1, y1, 'g^', x2, y2, 'g-')  

Let's look at an example of how the third way would look like:

In [ ]:
linear_data = [1,2,3,4,5,6,7,8]
exponential_data = [ x**2 for x in linear_data ]
# plot the linear data and the exponential data
plt.plot(linear_data, 'm-o', exponential_data, '-v',)
plt.show()

Filling Area between lines

Using the operator plt.fill_between we can fill the area between two horizontal curves.

The curves are defined by the points (x, y1) and (x, y2). This creates one or multiple polygons describing the filled area. Common parameters used with this operation are:

  • x = The first argument needs to be an iterable of the x-coordinates
  • y1 = The y coordinates of the nodes defining the first curve.
  • y2 = The y coordinates of the nodes defining the second curve.
  • where = Define where to exclude some horizontal regions from being filled.
  • facecolor = colour of filled area
In [ ]:
# for more info check out:
plt.fill_between?
In [ ]:
# let's fill the area between our two curves from above
linear_data = [1,2,3,4,5,6,7,8]
exponential_data = [ x**2 for x in linear_data ]
# plot the linear data and the exponential data
plt.plot(linear_data, 'm-o', exponential_data, '-v')


plt.fill_between(x = range(len(linear_data)),
                 y1=linear_data, 
                 y2=exponential_data,
                 facecolor='blue',
                 alpha=0.25)
Out[ ]:
<matplotlib.collections.PolyCollection at 0x7fa916e2c290>

Exercise For You
Try to guess the output of the following code snippet.

linear_data = [1,2,3,4,5,6,7,8]
exponential_data = [ x**2 for x in linear_data ]
plt.plot(linear_data, 'm-o', exponential_data, '-v')

plt.fill_between(range(len(linear_data)), [4, 14, 4, 14, 4, 14, 4, 14], 
[40, 30, 40, 30, 40, 30, 40, 30],
facecolor='blue', alpha=0.25)

Line Plots with Dates

Setting the used datatype to datetime64[D] returns dates. We can then extract the exact time points between two dates using np.arrange(). This looks as follows:

In [ ]:
import numpy as np
linear_data = [1,2,3,4]
exponential_data = [ x**2 for x in linear_data ]
observation_dates = np.arange('2020-04-27',
                              '2020-05-01',
                              dtype='datetime64[D]')

plt.plot(observation_dates, linear_data, '-o', observation_dates, exponential_data, '-o')
Out[ ]:
[<matplotlib.lines.Line2D at 0x7fa916d78410>,
 <matplotlib.lines.Line2D at 0x7fa916d19350>]

Label Rotation

As we can see the labels on the x-axis (so called ticks) are now rather cramped.Let's see how we can manually change this by slightly rotating the x-axis ticks.

In a first step we must get the current axes of our plot using .gca(). From these axes we are only interested in the ticks on the x-axis. Therefore, we save the x-axis via .xaxis(). From this axis we then get each individual tick label and rotate it by 45 degrees.

In [ ]:
plt.plot(observation_dates, linear_data, '-o', observation_dates, exponential_data, '-o')
x = plt.gca().xaxis
# rotate the tick labels for the x axis
for item in x.get_ticklabels():
  item.set_rotation(45)