Stacking NumPy arrays

When operating with two-dimensional data arrays, there are some common operations, such as the adding of data and variables, that NumPy functions can render easily and quickly.

The most common such operation is the addition of more cases to your array:

Let's start off by creating an array:

In: import numpy as np
    dataset = np.arange(10*5).reshape(10,5)

Now, let's add a single row and a bunch of rows that are to be concatenated after each other:

In: single_line = np.arange(1*5).reshape(1,5)
    a_few_lines = np.arange(3*5).reshape(3,5)

We can first try to add a single line:

In: np.vstack((dataset,single_line))

All you have to do is provide a tuple containing the vertical array preceding it and the one following it. In our example, the same command can work if you have more lines to be added:

In: np.vstack((dataset,a_few_lines))

Or, if you want to add the same single line more than once, the tuple can represent the sequential structure of your newly concatenated array:

In: np.vstack((dataset,single_line,single_line))

Another common situation is when you have to add a new variable to an existing array. In this case, you have to use hstack (h stands for horizontal) instead of the just-presented vstack command (where v is vertical).

Let's pretend that you have to add a bias of unit values to your original array:

In: bias = np.ones(10).reshape(10,1)
    np.hstack((dataset,bias))

Without reshaping bias (this, therefore, can be any data sequence of the same length as the rows of the array), you can add it as a sequence by using the column_stack() function, which obtains the same result but with fewer concerns regarding data reshaping:

In: bias = np.ones(10)
    np.column_stack((dataset,bias))

Adding rows and columns to two-dimensional arrays is basically all that you need to do to effectively wrangle your data in data science projects. Now, let's see a couple of more specific functions for slightly different data problems.

First, although two-dimensional arrays are the norm, you can also operate on a three-dimensional data structure. So, dstack(), which is analogous to hstack() and vstack() but operates on the third axis, will come in quite handy:

In: np.dstack((dataset*1,dataset*2,dataset*3))

In this example, the third dimension offers the original 2D array with a multiplicand, presenting a progressive rate of change (a time or change dimension).

A further problematic variation could be the insertion of a row or, more frequently, a column to a specific position into your array. As you may recall, arrays are contiguous chunks of memory. Insertion actually requires the recreation of a new array, splitting the original array. The NumPy insert command helps you to do so in a fast and hassle-free way:

In: np.insert(dataset, 3, bias, axis=1)

You just have to define the array where you wish to insert (dataset), the position (index 3), the sequence you want to insert (in this case, the array bias), and the axis along which you would like to operate the insertion (axis 1 is the vertical axis).

Naturally, you can insert entire arrays (not just vectors), such as bias, by ensuring that the array to be inserted is aligned with the dimension along which we are operating the insertion. In this example, in order to insert the same array into itself, we have to transpose it as an inserted element:

In: np.insert(dataset, 3, dataset.T, axis=1)

You can also make insertions on different axes (in the following case, axis 0, which is the horizontal one, but you can also operate on any dimension of an array that you may have):

In: np.insert(dataset, 3, np.ones(5), axis=0)

What is being done is that the original array is split at the specified position along the chosen axis. Then, the split data is concatenated with the new data to be inserted.

Table of Contents for Stacking NumPy arrays

Create new playlist

Sign In

Sign Up

Table of Contents for
Stacking NumPy arrays