Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

How it works...

How to do it...

Read in the movie dataset, set the index as the title, and then create a boolean Series matching all movies with a content rating of G and an IMDB score less than 4:

>>> movie = pd.read_csv('data/movie.csv', index_col='movie_title')
>>> c1 = movie['content_rating'] == 'G'
>>> c2 = movie['imdb_score'] < 4
>>> criteria = c1 & c2

Let's first pass these criteria to the .loc indexer to filter the rows:

>>> movie_loc = movie.loc[criteria]
>>> movie_loc.head()

Let's check whether this DataFrame is exactly equal to the one generated directly from the indexing operator:

>>> movie_loc.equals(movie[criteria])
True

Now let's attempt the same boolean indexing with the .iloc indexer:

>>> movie_iloc = movie.iloc[criteria]
ValueError: iLocation based boolean indexing cannot use an indexable as a mask

It turns out that we cannot directly use a Series of booleans because of the index. We can, however, use ndarray of booleans. To extract the array, use the values attribute:

>>> movie_iloc = movie.iloc[criteria.values]
>>> movie_iloc.equals(movie_loc)
True

Although not very common, it is possible to do boolean indexing to select particular columns. Here, we select all the columns that have a data type of 64-bit integers:

>>> criteria_col = movie.dtypes == np.int64
>>> criteria_col.head()
color                      False
director_name              False
num_critic_for_reviews     False
duration                   False
director_facebook_likes    False
dtype: bool

>>> movie.loc[:, criteria_col].head()

As criteria_col is a Series, which always has an index, you must use the underlying ndarray to make it work with .iloc. The following produces the same result as step 6.

>>> movie.iloc[:, criteria_col.values].head()

A boolean Series may be used to select rows and then simultaneously select columns with either integers or labels. Remember, you need to put a comma between the row and column selections. Let's keep the row criteria and select content_rating, imdb_score, title_year, and gross:

>>> cols = ['content_rating', 'imdb_score', 'title_year', 'gross']
>>> movie.loc[criteria, cols].sort_values('imdb_score')

This same operation may be replicated with .iloc, but you need to get the integer location of all the columns:

>>> col_index = [movie.columns.get_loc(col) for col in cols]
>>> col_index
[20, 24, 22, 8]

>>> movie.iloc[criteria.values, col_index]

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.