The main learning outcomes of this chapter are summarized as follows:
read_csv
and its variations, reading a dataset using open method in Python, reading a file in chunks using the open
method, reading directly from a URL, specifying the column names from a list, changing the delimiter of a dataset, and so on.This chapter is a head start into our journey to explore our data and wrangle it to make it modelling-worthy. The next chapter will go deeper in this pursuit whereby we will learn to aggregate values for categorical variables, sub-set the dataset, merge two datasets, generate random numbers, and sample a dataset.
Cleaning, as we have seen in the last chapter takes about 80% of the modelling time, so it's of critical importance and the methods we are learning will come in handy in the pursuit of that goal.