There's more...

A consequence of pandas using different syntax for the logical operators is that operator precedence is no longer the same. The comparison operators have a higher precedence than and, or, and not. However, the new operators for pandas (the bitwise operators &, |, and ~) have a higher precedence than the comparison operators, thus the need for parentheses. An example can help clear this up. Take the following expression:

>>> 5 < 10 and 3 > 4
False

In the preceding expression, 5 < 10 evaluates first, followed by 3 < 4, and finally, the and evaluates. Python progresses through the expression as follows:

>>> 5 < 10 and 3 > 4
>>> True and 3 > 4
>>> True and False
>>> False

Let's take a look at what would happen if the expression in criteria3 was written as follows:

>>> movie.title_year < 2000 | movie.title_year > 2009
TypeError: cannot compare a dtyped [float64] array with a scalar of type [bool]

As the bitwise operators have higher precedence than the comparison operators, 2000 | movie.title_year is evaluated first, which is nonsensical and raises an error. Therefore, parentheses are needed to have the operations evaluated in the correct order.

Why can't pandas use and, or, and not? When these keywords are evaluated, Python attempts to find the truthiness of the objects as a whole. As it does not make sense for a Series as a whole to be either True or False--only each element--pandas raises an error.

Many objects in Python have boolean representation. For instance, all integers except 0 are considered True. All strings except the empty string are True. All non-empty sets, tuples, dictionaries, and lists are True. An empty DataFrame or Series does not evaluate as True or False and instead an error is raised. In general, to retrieve the truthiness of a Python object, pass it to the bool function.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset