Time for action – handling NaNs with the nanmean(), nanvar(), and nanstd() functions

We will apply jackknife resampling to the stock data. Each value will be omitted by setting it to Not a Number (NaN). The nanmean(), nanvar(), and nanstd() can then be used to compute the arithmetic mean, variance, and standard deviation.

  1. First, initialize a 30-by-3 array for the estimates as follows:
    estimates = np.zeros((len(c), 3))
  2. Loop through the values and generate a new dataset by setting one value to NaN at each iteration of the loop. For each new set of values, compute the estimates:
    for i in xrange(len(c)):
       a = c.copy()
       a[i] = np.nan
    
       estimates[i,] = [np.nanmean(a), np.nanvar(a), np.nanstd(a)]
  3. Print the variance for each estimate (you can also print the mean or standard deviation if you prefer):
    print("Estimates variance", estimates.var(axis=0))

    The following is printed on the screen:

    Estimates variance [ 0.05960347  3.63062943  0.01868965]
    

What just happened?

We estimated the variances of the arithmetic mean, variance, and standard deviation of a small dataset using jackknife resampling. This gives us an idea of how much the arithmetic mean, variance, and standard deviation vary. The code for this example can be found in the jackknife.py file in this book's code bundle:

from __future__ import print_function
import numpy as np

c = np.loadtxt('data.csv', delimiter=',', usecols=(6,), unpack=True)

# Initialize estimates array
estimates = np.zeros((len(c), 3))

for i in xrange(len(c)):
   # Create a temporary copy and omit one value
   a = c.copy()
   a[i] = np.nan

   # Compute estimates
   estimates[i,] = [np.nanmean(a), np.nanvar(a), np.nanstd(a)]

print("Estimates variance", estimates.var(axis=0))
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset