< Previous Page | Home Page | Next Page >
The data we want to visualize often comes in form of a pandas DataFrame. Let's have a look how DataFrames and our plotting library interact. The interaction is actually rather easy. Say we have any give DataFrame df with some sample data. We can then simply plot this dataframe by typing df.plot()
import matplotlib as mpl
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
%matplotlib inline
#create a sample dataframe df
np.random.seed(123)
df = pd.DataFrame({'A': np.random.randn(365).cumsum(0),'B': np.random.randn(365).cumsum(0) + 20,
'C': np.random.randn(365).cumsum(0) - 20}, index=pd.date_range('04/27/2017', periods=365))
#plot df
df.plot()
Within the df.plot()
operator we can furthermore use the kind
parameter to specify which kind of plot we wish to plot. Options include:
Let's try out the scatter plot
df.plot("A", "B" ,kind= "scatter")
Due to the fact that we have two axes as well as different colours for the datapoints in a scatterplot, this kind of plot is especially useful when visualizing three dimensional data in a linear fashion. Here just think of our iris dataset from the beginning:
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
iris = load_iris()
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
df.drop(labels=["petal length (cm)", "petal width (cm)"], axis=1)
x_index = 0
y_index = 1
# this formatter will label the colorbar with the correct target names
formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])
plt.figure(figsize=(8, 6))
plt.scatter(iris.data[:, x_index], iris.data[:, y_index], c=iris.target)
cb = plt.colorbar(ticks=[0, 1, 2], format=formatter)
cb.ax.set_ylabel('Flower Species', rotation=270, labelpad = 15)
plt.xlabel(iris.feature_names[x_index])
plt.ylabel(iris.feature_names[y_index])
plt.tight_layout()
plt.show()
Another way to create a scatter plot from a dataframe would be the use of the df.plot.scatter()
parameter. The use is quite similar to above.
np.random.seed(123)
df = pd.DataFrame({'Length': np.random.randn(365).cumsum(0),
'Width': np.random.randn(365).cumsum(0) + 20,
'Height': np.random.randn(365).cumsum(0) - 20},
index=pd.date_range('04/27/2020', periods=365))
ax = df.plot.scatter('Length', 'Height', c='Width', cmap='viridis')
ax.set_aspect("equal")
In case we have higher dimensional data, it might be useful to plot a 3D graph. As this involves rather advanced python, this is not going to be relevant for the exam. The following example should nonetheless give you an idea of how it works:
from mpl_toolkits.mplot3d import Axes3D
#create sample dataframe with 3Dimensions
np.random.seed(123)
df = pd.DataFrame({'Length': np.random.randn(365).cumsum(0), 'Width': np.random.randn(365).cumsum(0) + 20,
'Height': np.random.randn(365).cumsum(0) - 20},
index=pd.date_range('04/27/2020', periods=365))
#create 3D scatter plot
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df["Length"].values, df["Height"].values, df["Width"].values, c=df["Width"].values)
ax.set_xlabel('Length');ax.set_ylabel('Height');ax.set_zlabel('Width')
Note: Pandas does not have an operator df.plot()
for 3D plots
Using the opeartor df.plot.box()
one can furthermore create a boxplot out of a dataframe
np.random.seed(123)
df = pd.DataFrame({'Length': np.random.randn(365).cumsum(0), 'Width': np.random.randn(365).cumsum(0) + 20, 'Height': np.random.randn(365).cumsum(0) - 20},
index=pd.date_range('04/27/2020', periods=365))
#create box plot
df.plot.box()
< Previous Page | Home Page | Next Page >