Problems

  1. Cloud storage prediction cost: Our software application generates data on a monthly basis and stores this data in cloud storage together with the data from the previous months. We are given the following bills for the cloud storage and we would like to estimate the running costs for the first year of using this cloud storage:

Month of using the cloud storage

Monthly bill in euros

1

120.0

2

131.2

3

142.1

4

152.9

5

164.3

1 to 12

?

  1. Fahrenheit and Celsius conversion: In the earlier example, we devised a formula converting degrees Fahrenheit into degrees Celsius. Devise a formula converting degrees Celsius into degrees Fahrenheit.
  2. Flight time duration prediction from the distance: Why do you think that a linear regression model resulted in the estimation of the speed to be 1192 km/h as opposed to the real speed of about 850 km/h? Can you suggest a way to a better model of the estimation of the flight duration based on the flight distances and times?
  3. Bacteria population prediction: A bacteria Escherichia coli has been observed in the laboratory and the size of its population was estimated by various measurements at 5-minute intervals as follows:

Time

Size of population in millions

10:00

47.5

10:05

56.5

10:10

67.2

10:15

79.9

11:00

?

What is the expected number of the bacteria to be observed at 11:00 assuming that the bacteria would continue to grow at the same rate?

Analysis:

  1. Every month, we have to pay for the data we have stored in the cloud storage so far plus for the new data that is added to the storage in that month. We will use linear regression to predict the cost for a general month and then we will calculate the sum of the first 12 months to calculate the cost for the whole year.

Input:

source_code/6/cloud_storage.r
bills = data.frame( month = c(1,2,3,4,5), bill = c(120.0,131.2,142.1,152.9,164.3) ) model = lm(bill ~ month, data = bills) print(model)

Output:

$ Rscript cloud_storage.r
Call:
lm(formula = bill ~ month, data = bills)
Coefficients: (Intercept) month
109.01 11.03

This means that the base cost is base_cost=109.01 euros and then to store the data added in 1 month costs additional month_data=11.03 euros. Therefore the formula for the nth monthly bill is as follows:

bill_amount=month_data*month_number+base_cost=11.03*month_number+109.01 euro

Remember that the sum of the first n numbers is (1/2)*n*(n+1). Thus the cost for the first n months will be as follows:

total_cost(n months)=base_cost*n+month_data*[(1/2)*n*(n+1)]
=n*[base_cost+month_data*(1/2)*(n+1)]
=n*[109.01+11.03*(1/2)*(n+1)]
=n*[114.565+5.515*n]

Thus for the whole year, the cost will be as follows:

total_cost(12 months)=12*[114.565+5.515*12]=2168.94 euros

Visualization:

In the graph below, we can observe the linearity of the model represented by the blue line. On the other hand, the sum of the points on the linear line is quadratic in nature and is represented by the area under the line.

  1. There are many ways to obtain the formula converting degrees Celsius into degrees Fahrenheit. We could use R and from the initial R file take the following line:

model = lm(celsius ~ fahrenheit, data = temperatures)

We then change it to:

model = lm(fahrenheit ~ celsius, data = temperatures)

Then we would obtain the desired reversed model:

Call:
lm(formula = fahrenheit ~ celsius, data = temperatures)
Coefficients: (Intercept) celsius 32.0 1.8

So degrees Fahrenheit can be expressed from degrees Celsius as: F=1.8*C+32.

We may obtain this formula alternatively by modifying the formula:

C=(5/9)*F-160/9

160/9+C=(5/9)*F

160+9*C=5*F F=1.8*C+32

  1. The estimated speed is so high because even flights over a short distance take quite long: for example, the flight from London to Amsterdam, where the distance between the two cities is only 365 km, takes about 1.167 hours. But, on the other hand, if the distance changes only a little, then the flight time changes only a little as well. This results in us estimating a very high initial setup time. Consequently, the speed has to be very high because there is only a small amount of time left to travel a certain distance.

If we consider very long flights where the initial setup time to flight time ratio is much smaller, we could predict the flight speed more accurately.

  1. The number of the bacteria at the 5-minute intervals is: 47.5, 56.5, 67.2, and 79.9 millions. The differences between these numbers are: 9, 10.7, and 12.7. The sequence is increasing. So we look at the ratios of the neighbor terms to see how the sequence grows. 56.5/47.5=1.18947, 67.2/56.5=1.18938, and 79.9/67.2=1.18899. The ratios of the successive terms are close to each other, so we have the reason to believe that the number of the bacteria in the growing population can be estimated using the exponential distribution by the model:

n = 47.7 * bm

Where n is the number of the bacteria in millions, b is a constant (the base), the number m is the exponent expressing the number of the minutes since 10:00 which is the time of the first measurement, 47.7 is the number of the bacteria at this measurement in millions.

To estimate the constant b, we use the ratios between the sequence terms. We know that b5 is approximately (56.5/47.5 + 67.2/56.5 + 79.9/67.2)/3=1.18928. Therefore the constant b is approximately b=1.189281/5=1.03528. Thus the number of the bacteria in millions is:

n = 47.7 * 1.03528m

At 11:00, which is 60 minutes later than 10:00, the estimated number of bacteria is:

47.7*1.0352860=381.9 7.7*1.0352860=381.9 million.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset