Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

How it works...

How to do it...

Import the employee data and select the DEPARTMENT and BASE_SALARY columns in a new DataFrame:

>>> employee = pd.read_csv('data/employee.csv')
>>> dept_sal = employee[['DEPARTMENT', 'BASE_SALARY']]

Sort this smaller DataFrame by salary within each department:

>>> dept_sal = dept_sal.sort_values(['DEPARTMENT', 'BASE_SALARY'], 
                                      ascending=[True, False])

Use the drop_duplicates method to keep the first row of each DEPARTMENT:

>>> max_dept_sal = dept_sal.drop_duplicates(subset='DEPARTMENT')
>>> max_dept_sal.head()

Put the DEPARTMENT column into the index for each DataFrames:

>>> max_dept_sal = max_dept_sal.set_index('DEPARTMENT')
>>> employee = employee.set_index('DEPARTMENT')

Now that the indexes contain matching values, we can append a new column to the employee DataFrame:

>>> employee['MAX_DEPT_SALARY'] = max_dept_sal['BASE_SALARY']
>>> employee.head()

We can validate our results with the query method to check whether there exist any rows where BASE_SALARY is greater than MAX_DEPT_SALARY:

>>> employee.query('BASE_SALARY > MAX_DEPT_SALARY')

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.