Reading in the data

We have collected the web statistics for the last month and aggregated them in a file named ch01/data/web_traffic.tsv (.tsv because it contains tab-separated values). They are stored as the number of hits per hour. Each line contains the hour and the number of web hits in that hour. The hours are listed consecutively.

Using SciPy's genfromtxt(), we can easily read in the data using the following code:

>>> data = np.genfromtxt("web_traffic.tsv", delimiter="	")

We have to specify tabs as the delimiter so that the columns are correctly determined. A quick check shows that we have correctly read in the data:

>>> print(data[:10])
    [[ 1.00000000e+00 2.27333105e+03] 
     [ 2.00000000e+00 1.65725549e+03] 
     [ 3.00000000e+00 nan]
     [ 4.00000000e+00 1.36684644e+03] 
     [ 5.00000000e+00 1.48923438e+03] 
     [ 6.00000000e+00 1.33802002e+03] 
     [ 7.00000000e+00 1.88464734e+03] 
     [ 8.00000000e+00 2.28475415e+03] 
     [ 9.00000000e+00 1.33581091e+03] 
     [ 1.00000000e+01 1.02583240e+03]]
>>> print(data.shape) 
(743, 2)

As you can see, we have 743 data points with 2 dimensions.

Table of Contents for Reading in the data

Create new playlist

Sign In

Sign Up

Table of Contents for
Reading in the data