Reading in the data

We have collected the web statistics for the last month and aggregated them in a file named ch01/data/web_traffic.tsv (.tsv because it contains tab-separated values). They are stored as the number of hits per hour. Each line contains the hour and the number of web hits in that hour. The hours are listed consecutively.

Using SciPy's genfromtxt(), we can easily read in the data using the following code:

>>> data = np.genfromtxt("web_traffic.tsv", delimiter="	")

We have to specify tabs as the delimiter so that the columns are correctly determined. A quick check shows that we have correctly read in the data:

>>> print(data[:10])
[[ 1.00000000e+00 2.27333105e+03]
[ 2.00000000e+00 1.65725549e+03]
[ 3.00000000e+00 nan]
[ 4.00000000e+00 1.36684644e+03]
[ 5.00000000e+00 1.48923438e+03]
[ 6.00000000e+00 1.33802002e+03]
[ 7.00000000e+00 1.88464734e+03]
[ 8.00000000e+00 2.28475415e+03]
[ 9.00000000e+00 1.33581091e+03]
[ 1.00000000e+01 1.02583240e+03]]
>>> print(data.shape)
(743, 2)

As you can see, we have 743 data points with 2 dimensions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset