We begin the analysis of each office's teams by investigating their overall compositions and actions. We have a variety of metrics that may help us understand why one team may perform better than another. The following is a list of common metrics that we can use in our HR perspective:
Let's begin with our analysis and compare the age distribution between the two offices. Instead of using a histogram, we use a frequency polygon so that we can compare more than one distribution in the same chart.
Variables |
Details |
---|---|
Label |
Value |
vEmployeeAgeBinSize |
1 |
Dimensions |
Details |
---|---|
Label |
Value |
Age |
=ValueLoop($(=floor(min({$<_Employee_Active_Flag={1}>} [Employee Age]),vEmployeeAgeBinSize)),$(=floor(max({$<_Employee_Active_Flag={1}>} [Employee Age]),vEmployeeAgeBinSize)),vEmployeeAgeBinSize ) |
Employee Current Office | |
Expressions |
Details |
Label |
Value |
Number of Employees |
sum({$<_Employee_Active_Flag={1}>} if(floor([Employee Age],vEmployeeAgeBinSize)=ValueLoop( $(=floor(min({$<_Employee_Active_Flag={1}>}[Employee Age]),vEmployeeAgeBinSize)),$(=floor(max({$<_Employee_Active_Flag={1}>} [Employee Age]),vEmployeeAgeBinSize)),vEmployeeAgeBinSize ),1,0)) |
Input Box
to edit the vEmployeeAgeBinSize variable.Normally, we would use the class()
function to create a histogram; however, the class()
function generates bins from existing values. A bin without any corresponding value behaves in the same way that missing or null values behaved in Chapter 12, Sales Perspective. Although we are not affected by this behavior when we use a bar chart, we are not so lucky when we work with a line chart. The line jumps from one existing bin directly to another without representing the missing bins as zero.
Therefore, we use the valueloop()
function rather than the class()
function because valueloop()
is a list of numbers that have no relationship with the data. We define the range of valueloop()
dynamically with the min()
and max()
functions and the bin size is determined by the value of the vEmployeeAgeBinSize
variable. The floor()
function helps to round down Employee Age so that we can assign it to the proper bin. We usually use floor()
to round down to the nearest one, but we can also use it to round down to the nearest five, ten, or thousand.
Finally, we add a second degree polynomial trend line to get a general idea of the shape of the distributions. As both resemble a normal distribution, this trend line does give us a general idea of the distribution. We can observe that the Wisconsin office in general has a greater number of young employees and fewer middle-aged ones. If the distribution does not resemble a normal distribution, we can try using trend lines of different degrees or, more simply, enlarge the bin size. Notice how different the distributions look without the trend lines and with an Employee Age Bin Size of 5:
The next analysis entails comparing how well each office pays their employees. We can analyze salary distribution among various groupings, such as office, job function, age, gender, and performance. In the following exercise, we are going to compare how each office pays each job function. Along with the option to use a frequency polygon trellis chart to compare the Salary Distribution by Job Function and Office, we can also use the following box plot chart:
Dimensions |
Details |
---|---|
Label |
Value |
Employee Current Job Title |
Employee Current Job Title |
Employee Current Office |
Employee Current Office |
Expressions | |
Label |
Value |
Lower Quartile |
Fractile( Aggr(only({$<_Employee_Active_Flag={1}>} [Employee Current Salary]),[Employee Current Office], [Employee ID],[Employee Current Job Title]) , 0.5) -Fractile( Aggr(only({$<_Employee_Active_Flag={1}>} [Employee Current Salary]),[Employee Current Office], [Employee ID],[Employee Current Job Title]) , 0.25) |
Upper Quartile |
Fractile( Aggr(only({$<_Employee_Active_Flag={1}>} [Employee Salary]),[Employee Current Office], [Employee ID],[Employee Current Job Title]) , 0.75) - Fractile( Aggr(only({$<_Employee_Active_Flag={1}>} [Employee Salary]),[Employee Current Office], [Employee ID],[Employee Current Job Title]) , 0.5) |
Dummy Expression |
|
Attribute Expressions | |
---|---|
Label |
Value |
Background Color |
if([Employee Current Office]='Iowa' ,ARGB(100,178,171,210) ,ARGB(100,253,184,99) ) |
Bar Offset |
Fractile( Aggr(only({$<_Employee_Active_Flag={1}>} [Employee Current Salary]),[Employee Current Office], [Employee ID],[Employee Current Job Title]) , 0.25) |
Error Below |
([Lower Quartile])- Min( Aggr(only({$<_Employee_Active_Flag={1}>} [Employee Current Salary]),[Employee Current Office], [Employee ID],[Employee Current Job Title]) ) |
Error Above |
-([Lower Quartile]) + Max( Aggr(only({$<_Employee_Active_Flag={1}>} [Employee Current Salary]),[Employee Current Office], [Employee ID],[Employee Current Job Title]) ) |
Attribute Expressions | |
---|---|
Label |
Value |
if([Employee Current Office]='Iowa' ,ARGB(200,178,171,210) ,ARGB(200,253,184,99) ) |
Background Color
Max( Aggr(only({$<_Employee_Active_Flag={1}>} [Employee Current Salary]),[Employee Current Office] ,[Employee ID],[Employee Current Job Title]) )*1.1
We could create a simple box plot using Box Plot Wizard in the file menu, Tools; however, in this case, we are limited to using one dimension. As a workaround, we use a combo chart with bars and error bars to create a box plot. As a part of this technique, we have to consider the fact that QlikView will draw the same number of error bars as there are expressions. As we have three distinct values in the Employee Current Office
field, we add a third, dummy expression that is not visible because the value is zero. If the second dimension has five distinct values, we create a total of three dummy expressions.
Although this trick may seem awkward at first, the chart is not effective when the second dimension has a large number of distinct values. Therefore, we never expect to choose one with more than five or so values and it is not much trouble to create three dummy expressions.
The last adjustment involves the maximum value of the expression axis scale. The maximum that QlikView automatically calculates is unnecessarily large. So, we dynamically calculate the maximum plus a ten percent cushion. This makes it easier to compare the different box plots.
As a high-employee turnover can affect a team's productivity, we analyze how many employees leave each month. We also analyze how many people we hire and how our team evolves as they accumulate more years of experience. Along with using histograms, frequency polygons, and box plots to visualize the distribution of experience, we can also show a summarized distribution over time with a stacked bar chart.
Dimensions | |
---|---|
Label |
Value |
Year-Month |
Year-Month |
Expressions | |
Label |
Value |
Dismissals |
-sum(Dismissal) |
3+ Years Experience |
count({$<[Employee Tenure in Months]={">36"}>} distinct [Headcount Counter]) |
2-3 Years Experience |
count({$<[Employee Tenure in Months]={">24<=36"}>} distinct [Headcount Counter]) |
1-2 Years Experience |
count({$<[Employee Tenure in Months]={">12<=24"}>} distinct [Headcount Counter]) |
0-1 Year Experience |
count({$<[Employee Tenure in Months]={">0<=12"}>} distinct [Headcount Counter]) |
New Hires |
sum(Hiring) |
This chart is simple to create because much of the work is done in the script. We create a new row in the fact table for every dismissal and new hire. The Dismissal
and Hiring
fields contain the value, 1
, so that we only have to sum them up to discover the total occurrences of each event. Also, every time we add to the employee payroll event, we calculate how many months have passed since they started working for the company.
When our employees take too little vacation and too many sick days, this may indicate an overstressed team who may not be as productive as they could be. We can analyze these events with a bar or line chart, but if we want to visualize them on a daily level, it may be more insightful to use a calendar heat map. This heat map was inspired by Julian Villafuerte's blog post at https://qlikfreak.wordpress.com/2014/03/09/heat-map/.
Dimensions | |
---|---|
Label |
Value |
Month |
Month |
Week |
Week |
<empty> |
Year |
<empty> |
=if(wildmatch(Weekday,'s*','t*') ,left(Weekday,2),left(Weekday,1)) |
Expressions | |
Label |
Value |
% Absent |
sum({$<[Absence Type]={'Sick Day','Unexcused'}>} Absence) / (count(Total <Month,Year> distinct [Headcount Counter]) * max(Total <Month,Year> Day)) |
Sick days will tend to increase as the team grows; so, instead of basing our heat map on the actual number of sick days, we calculate sick days as a percentage of the total number of employee working days. According to the chart, after a lull in sick days in the beginning of 2015, there seems to have been an increase in the later part of the year. We can also compare the different offices by adding [Employee Current Office]
as a dimension and placing it above the Year
dimension.
Our final break down of employee productivity is to analyze our employee training and the results of this training. We expect employees to have greater success in the company after they are trained; however, sometimes it is inevitable that they leave or transfer to another office. Along with analyzing whether employees stay with the company after training, we also take advantage of the SCD Type 2 in our HR data model to analyze how often employees transfer or earn promotions after their training.
Let's create the following bar chart:
Dimensions | |
---|---|
Label |
Value |
Employee Historical Office |
Employee Historical Office |
Expressions | |
Label |
Value |
Dismissal |
-sum({$<_Employee_Active_Flag={0}>} Training) |
Promotion / Transfer |
-count({$<Training={1},_Employee_Active_Flag={1}>} distinct if([Employee Historical Job Title] <> [Employee Current Job Title]and [Employee Historical Office] <> [Employee Current Office],[Employee ID])) |
Transfer |
-count({$<Training={1},_Employee_Active_Flag={1}>} distinct if([Employee Historical Job Title] = [Employee Current Job Title]and [Employee Historical Office] <> [Employee Current Office],[Employee ID])) |
Active w/o Change |
sum({$<_Employee_Active_Flag={1}>} Training)-[Promotion]+[Promotion / Transfer]+[Transfer] |
Promotion |
count({$<Training={1},_Employee_Active_Flag={1}>} distinct if([Employee Historical Job Title] <> [Employee Current Job Title]and [Employee Historical Office] = [Employee Current Office],[Employee ID])) |
The Training={1}
set analysis filters the data model, so we only see each employee's historical job title and office at the time he or she was trained. In this data model, we constantly update the current job title and office in every employee record so that we can always compare it with the historical records.
We use an if-statement within the count()
function in this example in order to highlight how to compare fields of different SCD types. If we are dealing with a large amount of data, then we migrate this if-statement to the script and create a flag in the data model that indicates which dimension attributes have changed. We can then use this flag in the expression's set analysis.
In the chart, we observe that both offices suffer from a large number of dismissals after training employees and employees in the Wisconsin office tend to transfer when they earn a promotion. This example demonstrates why it is important to understand how the data model handles slowly changing dimensions and how we use both SCD Type 1 and SCD Type 2 to create an insightful analysis.