The sequence of tasks will be as follows:
- Import the dataset.
- Carry out variables selection by taking only relevant variables.
- Remove the rows with negative values in departure delay.
- Calculate mean delay over months.
- Visualize the results.
Here is the code that performs these operations in one go:
USAairlineData2016 <- read.csv("USAairlineData2016.csv", as.is
= T)
USAairlineData2016 %>%
select(MONTH, DEP_DELAY) %>%
filter(DEP_DELAY>=0) %>%
group_by(MONTH) %>%
summarize(avgDelay=mean(DEP_DELAY)) %>%
qplot(factor(MONTH),avgDelay,data=.,group=1,geom=c("line",
"point")) %>%
add(xlab("Month")) %>%
add(ylab("Mean delay (in min)")) %>%
add(ggtitle("Mean delay in departure over months of 2016"))
%>%
add(theme_bw()) %>%