The str function

Once you understand that the str stands for structure, you understand that this function helps you get a clear description of attributes and hierarchies within your data. Let's try to run it on the three datasets we are dealing with:

str(cash_flow_report) 
str(customer_list)
str(stored_data)

Can you see the result? For cash_flow_report we have:

'data.frame': 84 obs. of 3 variables:
$ x : chr "north_america" "south_america" "asia" "europe" ...
$ y : chr "2014-03-31" "2014-03-31" "2014-03-31" "2014-03-31" ...
$ cash_flow: num 100956 111817 132019 91369 109864 ...

While customer_list gives us:

'data.frame':    148555 obs. of  3 variables:
$ customer_code : num 1 2 3 4 5 6 7 8 9 10 ...
$ commercial_portfolio: chr "less affluent" "less affluent" "less affluent" "less affluent" ...
$ business_unit : chr "retail_bank" "retail_bank" "retail_bank" "retail_bank" ...

And finally, our stored_data results in :

'data.frame':    891330 obs. of  9 variables:
$ attr_3 : num 0 0 0 0 0 0 0 0 0 0 ...
$ attr_4 : num 0 0 0 0 0 0 0 0 0 0 ...
$ attr_5 : num 0 0 0 0 0 0 0 0 0 0 ...
$ attr_6 : num 0 0 0 0 0 0 0 0 0 0 ...
$ attr_7 : num 0 0 0 0 0 0 0 0 0 0 ...
$ default_flag : Factor w/ 3 levels "?","0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ customer_code: num 1 2 3 4 5 6 7 8 9 10 ...
$ parameter : chr "attr_8" "attr_8" "attr_8" "attr_8" ...
$ value : num NA NA -1e+06 -1e+06 NA NA NA -1e+06 NA -1e+06 ...

We therefore have for each and every dataset the list of columns, their type (num and Factor here), and the first ten records. We will evaluate at the end of these paragraphs the level of tidiness of our data, nevertheless you can already think about it: consider the three tidy data rules introduced previously and try to figure out which of the three data frames, if any, respects them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset