There's more...

Appending a single row to a DataFrame is a fairly expensive operation and if you find yourself writing a loop to append single rows of data to a DataFrame, then you are doing it wrong. Let's first create 1,000 rows of new data as a list of Series:

>>> random_data = []
>>> for i in range(1000):
        d = dict()
        for k, v in data_dict.items():
            if isinstance(v, str):
                d[k] = np.random.choice(list('abcde'))
            else:
                d[k] = np.random.randint(10)
        random_data.append(pd.Series(d, name=i + len(bball_16)))

>>> random_data[0].head()
2B    3
3B    9
AB    3
BB    9
CS    4
Name: 16, dtype: object

Let's time how long it takes to loop through each item making one append at a time:

>>> %%timeit
>>> bball_16_copy = bball_16.copy()
>>> for row in random_data:
        bball_16_copy = bball_16_copy.append(row)
4.88 s ± 190 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

That took nearly five seconds for only 1,000 rows. If we instead pass in the entire list of Series, we get an enormous speed increase:

>>> %%timeit
>>> bball_16_copy = bball_16.copy()
>>> bball_16_copy = bball_16_copy.append(random_data)
78.4 ms ± 6.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

By passing in the list of Series, the time has been reduced to under one-tenth of a second. Internally, pandas converts the list of Series to a single DataFrame and then makes the append.

Table of Contents for There's more...

Create new playlist

Sign In

Sign Up

Table of Contents for
There's more...