Open In Colab

Understanding Arrays in Python

A Numpy Array is an iterable data type which is in essence very similar to a list. However, elements within an array must be of the same type!

Du to the many possible applications of arrays, they are one of the most used objects in the numpy library.

We can create an array by defining a list and passing this list into numpy's array function. Before we do that, let's have another look at lists first.

A Python List Is More Than Just a List

Let's consider what happens when we use a Python data structure that holds many Python objects. The standard mutable multi-element container in Python is the list. We can create a list of integers as follows:

In [ ]:
L = list(range(10))
L
Out[ ]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [ ]:
type(L[0])
Out[ ]:
int

Or, similarly, a list of strings:

In [ ]:
L2 = [str(c) for c in L]
L2
Out[ ]:
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
In [ ]:
type(L2[0])
Out[ ]:
str

Because of Python's dynamic typing, we can even create heterogeneous lists:

In [ ]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]
Out[ ]:
[bool, str, float, int]

But this flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other information–that is, each item is a complete Python object. In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array.

At the implementation level, the array essentially contains a single pointer to one contiguous block of data. The Python list, on the other hand, contains a pointer to a block of pointers, each of which in turn points to a full Python object like the Python integer we saw earlier. Again, the advantage of the list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type. Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing and manipulating data.

Fixed-Type Arrays in Python

Python offers several different options for storing data in efficient, fixed-type data buffers. The built-in array module (available since Python 3.3) can be used to create dense arrays of a uniform type:

In [ ]:
import array
L = list(range(10))
A = array.array('i', L)
A
Out[ ]:
array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Here 'i' is a type code indicating the contents are integers.

Much more useful, however, is the ndarray object of the NumPy package. While Python's array object provides efficient storage of array-based data, NumPy adds to this efficient operations on that data. We will explore these operations in later sections; here we'll demonstrate several ways of creating a NumPy array.

We'll start with the standard NumPy import, under the alias np:

In [ ]:
import numpy as np

Creating Arrays from Python Lists

First, we can use np.array to create arrays from Python lists:

In [ ]:
# integer array:
np.array([1, 4, 2, 5, 3])
Out[ ]:
array([1, 4, 2, 5, 3])

Remember that unlike Python lists, NumPy is constrained to arrays that all contain the same type. If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):

In [ ]:
np.array([3.14, 4, 2, 3])
Out[ ]:
array([ 3.14,  4.  ,  2.  ,  3.  ])

If we want to explicitly set the data type of the resulting array, we can use the dtype keyword:

In [ ]:
np.array([1, 2, 3, 4], dtype='float32')
Out[ ]:
array([ 1.,  2.,  3.,  4.], dtype=float32)

Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists:

In [ ]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])
Out[ ]:
array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

The inner lists are treated as rows of the resulting two-dimensional array.

Creating Arrays from Scratch

Especially for larger arrays, it is more efficient to create arrays from scratch using routines built into NumPy. Here are several examples:

In [ ]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)
Out[ ]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
In [ ]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)
Out[ ]:
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])
In [ ]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)
Out[ ]:
array([[ 3.14,  3.14,  3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14,  3.14,  3.14],
       [ 3.14,  3.14,  3.14,  3.14,  3.14]])
In [ ]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)
Out[ ]:
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
In [ ]:
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)
Out[ ]:
array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ])
In [ ]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))
Out[ ]:
array([[ 0.99844933,  0.52183819,  0.22421193],
       [ 0.08007488,  0.45429293,  0.20941444],
       [ 0.14360941,  0.96910973,  0.946117  ]])
In [ ]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))
Out[ ]:
array([[ 1.51772646,  0.39614948, -0.10634696],
       [ 0.25671348,  0.00732722,  0.37783601],
       [ 0.68446945,  0.15926039, -0.70744073]])
In [ ]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))
Out[ ]:
array([[2, 3, 4],
       [5, 7, 8],
       [0, 5, 0]])
In [ ]:
# Create a 3x3 identity matrix
np.eye(3)
Out[ ]:
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])
In [ ]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)
Out[ ]:
array([ 1.,  1.,  1.])

NumPy Standard Data Types

NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations. Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table. Note that when constructing an array, they can be specified using a string:

np.zeros(10, dtype='int16')

Or using the associated NumPy object:

np.zeros(10, dtype=np.int16)
Data type Description
bool_ Boolean (True or False) stored as a byte
int_ Default integer type (same as C long; normally either int64 or int32)
intc Identical to C int (normally int32 or int64)
intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
int64 Integer (-9223372036854775808 to 9223372036854775807)
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to 18446744073709551615)
float_ Shorthand for float64.
float16 Half precision float: sign bit, 5 bits exponent, 10 bits mantissa
float32 Single precision float: sign bit, 8 bits exponent, 23 bits mantissa
float64 Double precision float: sign bit, 11 bits exponent, 52 bits mantissa
complex_ Shorthand for complex128.
complex64 Complex number, represented by two 32-bit floats
complex128 Complex number, represented by two 64-bit floats

More advanced type specification is possible, such as specifying big or little endian numbers; for more information, refer to the NumPy documentation.