How it works...

Steps 2 and 3 find the maximum salary for each department. For automatic index alignment to work properly, we set each DataFrame index as the department. Step 5 works because each row index from the left DataFrame; employee aligns with one and only one index from the right DataFrame, max_dept_sal. If max_dept_sal had repeats of any departments in its index, then the operation would fail.

For instance, let's see what happens when we use a DataFrame on the right-hand side of the equality that has repeated index values. We use the sample DataFrame method to randomly choose ten rows without replacement:

>>> np.random.seed(1234)
>>> random_salary = dept_sal.sample(n=10).set_index('DEPARTMENT')
>>> random_salary

Notice how there are several repeated departments in the index. Now when we attempt to create a new column, an error is raised alerting us that there are duplicates. At least one index label in the employee DataFrame is joining with two or more index labels from random_salary:

>>> employee['RANDOM_SALARY'] = random_salary['BASE_SALARY']
ValueError: cannot reindex from a duplicate axis
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset