How to do it...

  1. Import the employee data and select the DEPARTMENT and BASE_SALARY columns in a new DataFrame:
>>> employee = pd.read_csv('data/employee.csv')
>>> dept_sal = employee[['DEPARTMENT', 'BASE_SALARY']]
  1. Sort this smaller DataFrame by salary within each department:
>>> dept_sal = dept_sal.sort_values(['DEPARTMENT', 'BASE_SALARY'], 
ascending=[True, False])
  1. Use the drop_duplicates method to keep the first row of each DEPARTMENT:
>>> max_dept_sal = dept_sal.drop_duplicates(subset='DEPARTMENT')
>>> max_dept_sal.head()
  1. Put the DEPARTMENT column into the index for each DataFrames:
>>> max_dept_sal = max_dept_sal.set_index('DEPARTMENT')
>>> employee = employee.set_index('DEPARTMENT')
  1. Now that the indexes contain matching values, we can append a new column to the employee DataFrame:
>>> employee['MAX_DEPT_SALARY'] = max_dept_sal['BASE_SALARY']
>>> employee.head()
  1. We can validate our results with the query method to check whether there exist any rows where BASE_SALARY is greater than MAX_DEPT_SALARY:
>>> employee.query('BASE_SALARY > MAX_DEPT_SALARY')
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset