Here are the following few steps to complete this recipe:
- Import the dataset.
- Write the customized function.
- Use the newly defined function in the dplyr framework.
Here are the necessary code blocks to implement all the preceding three steps:
USAairlineData2016 <- read.csv("USAairlineData2016.csv", as.is
= T)
# the new customized function to calculate summary statistics
fourNumSum <- function(x){
MIN_DELAY = min(x, na.rm=T)
MEAN_DELAY = mean(x, na.rm=T)
MEDIAN_DELAY = median(x, na.rm=T)
MAX_DELAY = max(x, na.rm=T)
return(data.frame(MIN_DELAY=MIN_DELAY, MEAN_DELAY=MEAN_DELAY,
MEDIAN_DELAY=MEDIAN_DELAY, MAX_DELAY=MAX_DELAY))
}
- Now, the fourNumSum function will be used within the dplyr framework to carry out the task as follows:
desStat <- USAairlineData2016 %>%
select(MONTH, ORIGIN, DEP_DELAY) %>%
group_by(ORIGIN, MONTH) %>%
do(fourNumSum(.$DEP_DELAY))
- The new object desStat will contain the output and the summary statistics of the DEP_DELAY variable using the fourNumSum function that has been applied over all possible combinations of the ORIGIN and MONTH variables.