The summary() function

Employing the summary() function is an easy way to take a first immediate look at your data distribution. You just have to pass to the function of the data frame you are working with, to get as a result a print out of the following summary statistics for each of the columns:

  • minimum
  • first quartile
  • median
  • mean
  • third quartile
  • maximum

Let's try, for instance, to apply this function to the Toothgrowth dataset, which is one of the built-in datasets within base R (if you want to discover more of those datasets, just run data() within your R console):

summary(ToothGrowth)
len supp dose Min. : 4.20 OJ:30 Min. :0.500 1st Qu.:13.07 VC:30 1st Qu.:0.500 Median :19.25 Median :1.000 Mean :18.81 Mean :1.167 3rd Qu.:25.27 3rd Qu.:2.000 Max. :33.90 Max. :2.000

As you can see, we find here for every and each variable, which are len, supp, and dose, the summary statistics descriptor. One final, relevant note: since the supp variable is a categorical one, for this variable we find the two different values the variable assumes, which are OJ and VC, together with their frequency, which is 30 for both of them. A lot of information for just one function, isn't it?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset