< Previous Page | Home Page | Next Page >
In the previous chapter, we dove into detail on NumPy and its ndarray
object, which provides efficient storage and manipulation of dense typed arrays in Python.
Here we'll build on this knowledge by looking in detail at the data structures provided by the Pandas library.
Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a DataFrame
.
DataFrame
s are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data.
As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.
Installation of Pandas on your system requires NumPy to be installed.
Details on this installation can be found in the Pandas documentation. Both can be installed via the pip
command.
Once Pandas is installed, you can import it and check the version:
import pandas
pandas.__version__
Just as we generally import NumPy under the alias np
, we will import Pandas under the alias pd
:
import pandas as pd
This import convention will be used throughout the remainder of this course.
As you read through this chapter, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature) as well as the documentation of various functions (using the ?
character).
For example, to display all the contents of the pandas namespace, you can type
ipython
In [3]: pd.<TAB>
And to display Pandas's built-in documentation, you can use this:
pd?
More detailed documentation, along with tutorials and other resources, can be found at http://pandas.pydata.org/.
< Previous Page | Home Page | Next Page >