< Previous Page | Home Page | Next Page >
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
Let's consider the following line graph.
plt.plot([3, 5, 2], [2, 0, 8], "^k:")
Let's explore what the individual components actually do:
[3,5,2]
represent the x-parameters whereas [2,0,8]
represents the y-parameters of our graph. This leads to the following three x:y coordinates 3:2
, 5:0
, and 2:8
. The graph then draws a line between the points in the order they have been specified."^k:"
is again a format string (fmt), which consists of the following parts: fmt = '[marker][line][color]'
. In our case the "^"
stands for a triangle marker, the "k"
specifies that the markers and their connecting lines are black and the ":"
transform the line into a dotted line.As we can see, documentation is key for understanding how plots work!
Exercise For You
Again try to guess which plot the following code returns. Use the empty cell below to verify your result.
plt.plot([2,3,4,5],[5,4,3,2], "Dg-.")
Hint: Use the documentation (plt.plot?) to find the meaning of the format string "Dg-.".
#run to access documentation
plt.plot?
# test your answer
Another commonly used plot type is the simple scatter plot, a close cousin of the line plot. Instead of points being joined by line segments, here the points are represented individually with a dot, circle, or other shape. The plot used in our iris dataset example above was a scatter plot.
plt.plot()
¶Technically we can use the standart .plot() operator in connection with the "o" format string (does not draw a line between points) to define a scatter plot. This would look as follows:
plt.plot([2,3,4,5],[5,4,3,2], "o")
plt.scatter()
¶A second, more powerful method of creating scatter plots is the plt.scatter
function, which can be used very similarly to the plt.plot
function:
plt.scatter([2,3,4,5],[5,4,3,2])
The primary difference of plt.scatter
from plt.plot
is that it can be used to create scatter plots where the properties of each individual point (size, face color, edge color, etc.) can be individually controlled or mapped to data.
Let's show this by creating a random scatter plot with points of many colors and sizes.
x = [1,2,3,4,5,6,7,8,9]
y= [1,2,3,4,5,4,3,2,1]
colors = ["green", "blue", "red", "yellow", "cyan", "black", "black", "white", "green"]
size = [100,150,200,250,300,250,200,150,100]
# creates a plot with points at x:y coordinates that have size s and colors c
plt.scatter(x, y, s=size, c=colors)
Exercise For You
Try to guess the output of the following code snippet.
x = [1,2,3,4,5,6,7,8]
y = x
colors = ['green']*(len(x)-1)
colors.append('blue')
plt.scatter(x, y, s=100, c=colors)
#Test your answer here
Exercise For You
Try to guess the output of the following code snippet.
x = [1,2,3,4,5,6,7,8]
y = x
colors = ["orange" if x % 2 == 0 else "black" for x in range(len(x))]
plt.scatter(x, y, s=100, c=colors)
# Test your answer here
To make your graph more meaningful, it is often necessary to add labels that describe what the hack is shown.
Adding labels is rather easy. For labels displayed at the x-axis just use plt.xlabel
, for labels displayed at the y-axis use plt.ylabels
. We can also give our plot a title using plt.title
.
#prepare some dummy data
x = [1,2,3,4,5,6]
y = x
plt.scatter(x[:2], y[:2], s=100, c='red')
plt.scatter(x[2:], y[2:], s=100, c='blue')
# add a label to the x axis
plt.xlabel('The number of times the student read this notebook')
# add a label to the y axis
plt.ylabel('The grade of the student')
# add a title
plt.title('Relationship between excercise and grades')
You can furthermore use the argument labelpad (for ylabel/xlabel) or pad (for title) to determine the distance between the label and our graph.
#prepare some dummy data
x = [1,2,3,4,5,6]
y = x
plt.scatter(x[:2], y[:2], s=100, c='red')
plt.scatter(x[2:], y[2:], s=100, c='blue')
# add labelpad
plt.xlabel('The number of times the student read this notebook', labelpad=10 )
# we can also enter negative values for labelpad
plt.ylabel('The grade of the student', labelpad=-35)
# add a title
plt.title('Relationship between excercise and grades', pad=10)
While our graph looks better now, the difference between red and blue points seem to be unclear. Additional information like this can be displayed using a legend. We can add the legend elements using the "label
" parameter within the scatter plot. After this we also have to call the legend using plt.legend()
x = [1,2,3,4,5,6]
y = x
plt.scatter(x[:2], y[:2], s=100, c='red', label= "Group A")
plt.scatter(x[2:], y[2:], s=100, c='blue',label = "Group B" )
# add a label to the x axis
plt.xlabel('The number of times the student read this notebook')
# add a label to the y axis
plt.ylabel('The grade of the student')
# add a title
plt.title('Relationship between excercise and grades')
plt.legend()
Better, but not quite there yet. Let's customize our legend as follows:
x = [1,2,3,4,5,6]
y = x
plt.scatter(x[:2], y[:2], s=100, c='red', label= "Group A")
plt.scatter(x[2:], y[2:], s=100, c='blue',label = "Group B" )
# add a label to the x axis
plt.xlabel('The number of times the student read this notebook')
# add a label to the y axis
plt.ylabel('The grade of the student')
# add a title
plt.title('Relationship between excercise and grades')
plt.legend(loc="upper left", frameon=True, title='Legend')
#feel free to check the documentation to see which further parameters for our legend exist
plt.legend?
There are various ways to plot multiple sets of data. The plt.plot() documentation gives us the following three possibilities:
The most straight forward way is just to call plot
multiple times.
Example:
plot(x1, y1, 'bo')
plot(x2, y2, 'go')
Alternatively, if your data is already a 2d array, you can pass it
directly to x, y. A separate data set will be drawn for every
column.
Example: an array a
where the first column represents the x
values and the other columns are the y columns::
plot(a[0], a[1:])
The third way is to specify multiple sets of [x], y, [fmt]
groups::
plot(x1, y1, 'g^', x2, y2, 'g-')
Let's look at an example of how the third way would look like:
linear_data = [1,2,3,4,5,6,7,8]
exponential_data = [ x**2 for x in linear_data ]
# plot the linear data and the exponential data
plt.plot(linear_data, 'm-o', exponential_data, '-v',)
plt.show()
Using the operator plt.fill_between
we can fill the area between two horizontal curves.
The curves are defined by the points (x, y1) and (x, y2). This creates one or multiple polygons describing the filled area. Common parameters used with this operation are:
# for more info check out:
plt.fill_between?
# let's fill the area between our two curves from above
linear_data = [1,2,3,4,5,6,7,8]
exponential_data = [ x**2 for x in linear_data ]
# plot the linear data and the exponential data
plt.plot(linear_data, 'm-o', exponential_data, '-v')
plt.fill_between(x = range(len(linear_data)),
y1=linear_data,
y2=exponential_data,
facecolor='blue',
alpha=0.25)
Exercise For You
Try to guess the output of the following code snippet.
linear_data = [1,2,3,4,5,6,7,8]
exponential_data = [ x**2 for x in linear_data ]
plt.plot(linear_data, 'm-o', exponential_data, '-v')
plt.fill_between(range(len(linear_data)), [4, 14, 4, 14, 4, 14, 4, 14],
[40, 30, 40, 30, 40, 30, 40, 30],
facecolor='blue', alpha=0.25)
Setting the used datatype to datetime64[D]
returns dates. We can then extract the exact time points between two dates using np.arrange(). This looks as follows:
import numpy as np
linear_data = [1,2,3,4]
exponential_data = [ x**2 for x in linear_data ]
observation_dates = np.arange('2020-04-27',
'2020-05-01',
dtype='datetime64[D]')
plt.plot(observation_dates, linear_data, '-o', observation_dates, exponential_data, '-o')
As we can see the labels on the x-axis (so called ticks) are now rather cramped.Let's see how we can manually change this by slightly rotating the x-axis ticks.
In a first step we must get the current axes of our plot using .gca()
. From these axes we are only interested in the ticks on the x-axis. Therefore, we save the x-axis via .xaxis()
. From this axis we then get each individual tick label and rotate it by 45 degrees.
plt.plot(observation_dates, linear_data, '-o', observation_dates, exponential_data, '-o')
x = plt.gca().xaxis
# rotate the tick labels for the x axis
for item in x.get_ticklabels():
item.set_rotation(45)
< Previous Page | Home Page | Next Page >