< Previous Page | Home Page | Next Page >
In Data Science it is often useful to be able to visualize any given data. Humans are very visual creatures: we understand things better when we see things visualized. After all, it is often not possible to represent some results, relationships or peculiar data patterns just by looking at a number of tables.
Let's take for example the often used "iris" data set, which shows three different flowers (virginica, setosa and versicolor) as well as the length & widths of their petal (Blütenblatt). If we would create a default dataframe (remember the lesson on pandas) of the dataset, we would get the following.
# Download the iris dataset and load it into a DataFrame
from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
iris = load_iris()
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
df.drop(labels=["petal length (cm)", "petal width (cm)"], axis=1)
As we can see, the mere table of different sepal lengths and widths does not really tell us much. Quite contrary, it might even lead to confusion.
If we, however, plot the same dataset and add some nice colour coding, the relationship between a flower species and its sepal dimensions become much more apparent!
# no need to understand the following code - just run it and compare the table with the plot
import matplotlib.pyplot as plt
x_index = 0
y_index = 1
# this formatter will label the colorbar with the correct target names
formatter = plt.FuncFormatter(lambda i, *args: iris.target_names[int(i)])
plt.figure(figsize=(8, 6))
plt.scatter(iris.data[:, x_index], iris.data[:, y_index], c=iris.target)
cb = plt.colorbar(ticks=[0, 1, 2], format=formatter)
cb.ax.set_ylabel('Flower Species', rotation=270, labelpad = 15)
plt.xlabel(iris.feature_names[x_index])
plt.ylabel(iris.feature_names[y_index])
plt.tight_layout()
plt.show()
As we can see, the plot gives us much more detailed and concise understanding of the given data. We can, for example, see that setosa flowers tend to have wide and short sepals whereas the sepals of virginica flowers seem to be taller.
< Previous Page | Home Page | Next Page >