How it works…

In the step 1 from the previous section, the directory location where the CSV file and the XDF file are stored is located. Then in step 2, the read.csv() function imports the entire dataset into an R session for further processing. In step 3, using the aggregate function, the mean departure delay is calculated, and in the same step, the processing time is calculated using the system.time() function.

The processing time for each section is as follows:

    > system.time(
+ usairlineCSV <- read.csv("csv_USAairlines2016.csv")
+ )
user system elapsed
19.11 0.51 19.72

It took almost 20 seconds to read the entire CSV file into R memory. Here is the time requirement for calculating the mean:

    > system.time(
+ meanDelay<- with(usairlineCSV, aggregate(DEP_DELAY,
by=list(ORIGIN, DEST), FUN= "mean", na.rm=T))
+ )

user system elapsed
7.02 0.34 7.37

It took more than 7 seconds to calculate the mean delay for each combination of the origin and destination airports.

In step 4, the xdfFile object creates a connection between the actual XDF file and the R session. Later on, in step 5, this object is used as the input data, and then the mean departure delay is calculated. The required time for step 4 and step 5 is given as follows:

    > system.time(
+ xdfFile <- file.path(getwd(), "USAirlines2016.xdf")
+ )

user system elapsed
0 0 0

> system.time(
+ sumstatxdf <- rxSummary(DEP_DELAY~ORIGIN:DEST,
summaryStats = "Mean", data = xdfFile)
+ )

Rows Read: 5562425, Total Rows Processed: 5562425, Total Chunk
Time: 0.335 seconds
Computation time: 0.477 seconds.
user system elapsed
1.21 0.00 1.83

Creating a connection with an XDF file took less than 1 second, and calculating the mean for each combination of origin and destination only took less than 2 seconds.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset