- Import the employee data and select the DEPARTMENT and BASE_SALARY columns in a new DataFrame:
>>> employee = pd.read_csv('data/employee.csv')
>>> dept_sal = employee[['DEPARTMENT', 'BASE_SALARY']]
- Sort this smaller DataFrame by salary within each department:
>>> dept_sal = dept_sal.sort_values(['DEPARTMENT', 'BASE_SALARY'],
ascending=[True, False])
- Use the drop_duplicates method to keep the first row of each DEPARTMENT:
>>> max_dept_sal = dept_sal.drop_duplicates(subset='DEPARTMENT')
>>> max_dept_sal.head()
- Put the DEPARTMENT column into the index for each DataFrames:
>>> max_dept_sal = max_dept_sal.set_index('DEPARTMENT')
>>> employee = employee.set_index('DEPARTMENT')
- Now that the indexes contain matching values, we can append a new column to the employee DataFrame:
>>> employee['MAX_DEPT_SALARY'] = max_dept_sal['BASE_SALARY']
>>> employee.head()
- We can validate our results with the query method to check whether there exist any rows where BASE_SALARY is greater than MAX_DEPT_SALARY:
>>> employee.query('BASE_SALARY > MAX_DEPT_SALARY')
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.