There's more...

Appending a single row to a DataFrame is a fairly expensive operation and if you find yourself writing a loop to append single rows of data to a DataFrame, then you are doing it wrong. Let's first create 1,000 rows of new data as a list of Series:

>>> random_data = []
>>> for i in range(1000):
d = dict()
for k, v in data_dict.items():
if isinstance(v, str):
d[k] = np.random.choice(list('abcde'))
else:
d[k] = np.random.randint(10)
random_data.append(pd.Series(d, name=i + len(bball_16)))

>>> random_data[0].head()
2B 3 3B 9 AB 3 BB 9 CS 4 Name: 16, dtype: object

Let's time how long it takes to loop through each item making one append at a time:

>>> %%timeit
>>> bball_16_copy = bball_16.copy()
>>> for row in random_data:
bball_16_copy = bball_16_copy.append(row)
4.88 s ± 190 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

That took nearly five seconds for only 1,000 rows. If we instead pass in the entire list of Series, we get an enormous speed increase:

>>> %%timeit
>>> bball_16_copy = bball_16.copy()
>>> bball_16_copy = bball_16_copy.append(random_data)
78.4 ms ± 6.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

By passing in the list of Series, the time has been reduced to under one-tenth of a second. Internally, pandas converts the list of Series to a single DataFrame and then makes the append.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset