Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write
– H. G. Wells
Understanding the importance of data, use of proper data in problem formulation, being knowledgeable on the pitfalls in problem formulation and collection of data and interpretation of data are extremely important. These aspects are explained with illustrative examples in this chapter. The understanding one should have on the role of statistics in the analysis of data is also explained with illustrative examples.
Whenever a problem is encountered, it has to be spelt out clearly and unambiguously besides quantitatively. It has to be noted that a problem well stated is half-solved. After the problem is stated, it has to be checked and verified for its clarity and unambiguity. It is better to check this point with resource persons not connected with the problem. In order to know the nature of the problem, magnitude of the problem appropriate data need to be collected. Details on data collection need to be finalised after ensuring that the data facilitates the ‘measurements’ related to the problem. Data may be available readily, if no special data has to be collected. If data is available, how that data is being used needs to be examined. This reflects the reasons for the prevalence of the problem. All these constitute the thought routine on problem and its data. These are explained in Figure 20.1.
Five illustrative examples drawn from experience are given to indicate the type of pitfalls in problem formulation which have to be avoided through reality check. The first two examples highlight the point that at the problem formulation stage, the persons closest to the problem need to be consulted. The next two highlight the point that the problem needs to be closely scrutinised to decide on the appropriate data to be collected. The last one emphasises the need to know the process for restoring it back to normalcy.
Figure 20.1 Thought routine for data search
In an exercise on cost reduction in utilities used in a coal mine, it was found from cost data that conveyor-belt consumption needed priority. Accordingly, the team took up the exercise, titled their project as ‘study of tear and wear of conveyor belts’ and planned data collection in the field. It chose a mine, selected certain sections which used conveyor belts; on a sample basis physically identified a dozen belts under use and took measurements such as thickness and width on them, informed the mine manager as well as section incharge about what had been done and also informed that the team would visit again to repeat the measurements on the same belts. Accordingly, the team returned and to its dismay none of the identified and marked belts were found. The team on checking with the operators found to its utter surprise and embarrassment their reply that belts do not tear and wear but get damaged and thrown out.
In a ceramic factory manufacturing decorative crockery, the management found that the bottleneck to production of decorative ware was kilns. Accordingly, it took the decision to add four kilns to convert plain ware to decorative ware. Workmen of the section came to know of the decision. They met the general manager and informed him that there may not be any need for additional kilns and there is possibility of increasing the production with existing facility by reducing the idle time in kiln utilisation and also by improving the method of kiln loading and arranging the plain ware. Workmen offered to work on these issues under the guidance of a supervisor. The result was that the order for new kiln was limited to only one and not four.
In a composite textile mill, several years ago, a study on weaving efficiency was taken up. Over a period of 1 month, elaborate data on weaving efficiency, loss of efficiency due to different causes were collected, analysed and the report was submitted to the chief of the organisation.
The chief called for a meeting of the weaving master and the author of the report. The meeting lasted only for 2 min with the chief telling the weaving master that the weaving efficiency is low and the due date for extending the terms of his appointment is only 4 weeks away. With this the weaving master got the right message. The chief who was two decades elder to the author of the report told him as a piece of advice “Data are important, certain improvements can be achieved through ‘Banging’, ‘Banging statistics’ must be put to use first and then other issues need to be taken up”. ‘Banging Statistics’ are the facts and figures which can straight away be used to alert and pressurise to control and improve. At the end of 2 weeks, another review was made. True to the expectation of the chief, ‘banging statistics’ had worked. The weaving efficiency which was low at 70 per cent had increased to 80 per cent. The chief complemented the weaving master and directed him to study the causes of low efficiency found in the investigation report and take suitable measures.
This is related to a manual operation of adjusting the tension in springs. Twelve operators were involved in this exercise. The rework rate due to improper adjustment was about 8 per cent.
The persons who took up this problem were well versed in the use of statistical techniques. Investigation was taken up. They collected the data on spring tension from 10 samples from each operator and subjected the data for analysis by ANOVA. They found that the variation in spring tension in each operator was ‘high’ and there was no difference between the operators; and came out with the suggestion that each operator has to improve the skill to minimise variation. The results were discussed with the section incharge.
Two weeks later, he reported that rework on springs due to improper tension can touch ‘zero level’. All he did was to categorise the operators as ‘best’, ‘better’, ‘average’, ‘poor’ on the basis of the rework rate: best below 1 per cent, better 1–2 per cent, average 2.1–5.0 per cent and poor >5.0 per cent monitor. Performance of each operator was monitored. Freedom was accorded to watch how the best performers do their job. Likewise ‘worst’ ones were also observed. This enabled the individual operators to improve their own skills. The result was each operator moved towards the “best” level.
The above example illustrates the (mis)use of an advanced technique like ANOVA when, in fact, more simple and straight forward techniques are available. This tendency arises out of a desire to render the task of problem solving more ‘sophisticated and profound’ through the use of advanced techniques which is analogous to prescribing costly medicines when simple versions would serve the purpose. This desire to force-fit a technique to appear sophisticated termed as ‘Sophisticated Syndrome’ is on the rise with the advent of Six Sigma, green and black belts. Remedy for this syndrome lies in understanding the problem in all its details and then determining the appropriate techniques starting from the simple ones.
The overall rework in a certain soldering operation consisting of 10 operators was 1 per cent. One week was spent on collecting the soldering defect cause-wise and operator-wise. The data showed that rework was 5 per cent in the case of one operator, about 1 per cent each in the case of five and zero in the remaining four.
On examination, it was found that
Based on the findings, actions were taken and the rework touched almost zero. When the study report was discussed, there was satisfaction about the findings and results besides the strong feeling that such studies can be avoided by keeping the processes neat, clean and disciplined. “Should data be collected to discover an operator with four fingers?” was the humorous disdain with which the study was perceived.
The pertinent points to be noted from these examples are:
These five steps are, in fact, applied by doctors on any patient and if these do not yield results, further investigations are taken up.
There is a well recognised fact in the field of neurology that most of the neurological syndromes which have stood the test of time were in fact discovered on the basis of a careful study of single cases followed by their authentication by repeated observations in other patients; and not by averaging of results in a large sample. Availability of a large sample on a neurological syndrome is also difficult. This view point can be remembered and being aware of it is more relevant to other areas also as defect rate in many are 10 ppm or less.
It has been noted in Chapter 12 that quality characteristics fall into two categories, measurable and attribute. Accordingly, data also fall into two categories—measurable (variable) and attribute (counting/classification).
It is a common practice to summarise the data and condense it into a few entities to convey the message from the data. In Chapter 19, the tools of arrangement of data are dealt to identify patterns to get the ‘message’. Here, certain calculations are to be applied to the data to get the message. These are furnished in Table 20.1.
For the measurement of data, the method of obtaining average and standard deviation is given in Annexure 20A.
For the data represented here, the method of arriving at the average and standard deviation are shown in Table 20.2.
TABLE 20.1 Data Summarisation Through Calculation
TABLE 20.2 Attribute Data—Average and Standard Deviation
Following points supplement the information in Tables 20.1 and 20.2.
The path of investigation is shown in Figure 20.2.
Figure 20.2 Investigation phases
If it can be known that the favourable or unfavourable response is not due to chance but due to special causes (these need to be probed and found), it is worth looking for these special causes. Use of statistical techniques helps to answer the question related to ‘chance’ and ‘special’ and thus give added strength to the investigation process. Similar logic holds good in the case of (e) and (g) of phases 2 and 3, respectively, in Figure 20.2.
In the investigation phases—step (d) in phase 1, (e) in phase 2 and (g) in phase 3 need statistical justification for accepting (i) the message in case of (d), (ii) newer combination of processing factors in case of (e) and (iii) the effectiveness of a decision taken in case of (g). The following example related to (d) illustrates the point.
Data on yield available in process records was analysed to assess average yield and week-wise variation over 4 weeks. Weekly results are as follows. What is the message from the available data hereunder? Message based on common sense and its intervention through statistical logic is in Table 20.3.
TABLE 20.3 Message Based on Common Sense and its Intervention Through Statistical Logic
Message based on common sense | Statistical intervention to common sense message |
---|---|
i) Observed increase in yield from 87.5 in week 1 to 93.5 in week 3 is real and the causes for such change is probed |
a) In the first case, variation associated with 93.5 is more than the one that is associated with 87.5. Similar observation holds good for the second case also |
ii) Drop in yield from 93.5 in week 3 to 89.0 in week 4 is real and the causes for such a change is probed |
b) The observed difference is to be weighed against the variation (s.d.) in yield to assess whether the difference in yield is explained in terms of variation due to ‘chance’ (common causes) or ‘special causes’. If the difference is due to common causes, the change—increase or decrease—represented by the difference is not real |
Regarding experimentation, it has to be stated that while choosing the key factors that affect the response from among the several available and selecting the levels of each of the key factors, decision on a technical basis; planning experiments as well as analysis of data obtained from experiments strictly belong to the domain of statistical methodology. This is the subject matter of Chapter 25.
In the context of investigation it is relevant to know the importance of data on results and process. Data on results of a process as well as on process itself are necessary in any process improvement study. Data on results are measurements made at the end of a process like its yield, recovery, rejection, consumption of utilities, utilisation etc. Such data are generally available and they measure process efficiency / results.
In contrast to this there exists data on process parameters like speed, rate of addition, concentration, temperature, pressure etc which measure how the process operates. Such data on process are generally available in a processing operation and/ or where such data have to be maintained as per mandate/ rule. Data on process are similar to the on-line-monitoring of measurements on parametres of cardiac condition of a patient in intensive care unit. Action on these parameters of cardiac condition (how process operates) impacts the patient and improves his well-being which is the end result of cardiac treatment. Likewise, there are monitoring devices to reflect the improvement made in skill, technique, stamina, concentration in a sports person after undergoing coaching sessions-parameters that go to make a good sports person. Review and action on these parameters enhances the end result, achieving a certain goal/target.
Thus the point to be noted is that data on end-of-the-process need to be examined to judge the need for measurements on the process and such data need to be collected and acted upon to achieve the desired result. These two aspects of process measurements- at the end of the process and on the process- are explained through a case example in Chapter 27.
Simple techniques of statistical analysis which provide an answer to the question whether a result is due to chance or a special cause are given in Chapters 23 to 25. They cover measurement as well as attribute data.
There are a number of software packages on statistical analysis and they need to be used to save time and effort. But the vigour and thrust to improve a project, enriching one’s intuition and insight to probe into a problem, etc., should be done not through the use of the software packages but through one’s understanding of the statistical logic and its significance, skill for seeking patterns, flair for figures, etc. The probing skills need to be developed. Consistent and active involvement in problem solving bring forth the data based on probing skills.
Measuring devices are used to collect data and to decide on product acceptance. Hence, quality aspects of measuring instruments and devices are important and one dealing with continual improvement tasks has to be aware of the quality of measuring instruments and devices. This aspect is dealt extensively in Chapter 21.
Interpretation of results of the analysis of data can be described as follows:
Interpretation = Points arising out of the analysis of data
+ technical knowledge associated with data
+ intuition (optional)
Misinterpretation is the result of ignoring these linkages. One should guard against misinterpretation. A knowledge of the types of common misinterpretations helps in arriving at correct interpretation. The following examples illustrate the point.
Five-year plans did usher in benefits but the society also expanded. Result: standard of living did not improve. Reason: the strong social belief that the God who bestows children also protects, was ignored but the ‘borrowed’ assumption already stated was relied upon.
It is illustrated as under.
Particulars | Data |
---|---|
a) Lunch and dinner prepared (ave/day) |
2500 |
b) No. of patients (ave/day) |
800 |
From this data, the average number of lunches and dinners that should have been prepared per day is 1600 [800 × (1 lunch + 1 dinner)]. But the actual number prepared is 2500. Hence, on an average excess of 900 lunches and dinners are prepared.
Is this conclusion valid? This has to be looked into and here lies the value of interpretation and merit of understanding the data properly. Details are as follows:
Data on lunches and dinner are from the dietary section.
Data on the number of patients per day are collected by the nursing section. In a hospital where there are constant admissions and discharges, how was the data on number of patients in a day obtained? This question when posed to the nursing chief brought forth the reply that the data are collected through the midnight census when there are practically no admission or discharges. The pertinent point to note is that one should know all the details about how the data are obtained.
Coming to interpretation, it was pointed out that the figure 1600 derived from 800 is not comparable with the figure 2500, because they are obtained on a totally different basis. Hence, the rule, apples and oranges cannot be compared.
In interpreting, the environmental situation also plays a key role. In this case, it was noted that the patients got admitted in time to have their lunch and got discharged after having their dinner. Thus, it was shown that there was no excess preparation of lunch or dinner. This example is from an actual case handled by one of the authors in a large hospital.
Institution | Per cent success |
---|---|
A B |
100% 80% |
When it comes to percentage ask the question ‘Per cent of how many?’ This question brings out the fallacy, if any. In this example, in case of A, it was 100 per cent of 3 and in B it was 80 per cent of 300.
Such a case reflects the use of an instrument not fit for the measured item.
Today, right to information has been enacted as a law; emphasis is on e-governance having transparency, speed and instantaneous information as its cardinal principles on the status of a case. All these need database. Data is needed to formulate correct policy and bring about appropriate reforms. Such a data system is not available in every sector of governance: legislature, judiciary and executive. India has one of the best statistical systems in the world to collect and correlate data on many sectors of society at the national level. But every sector is yet to give such an account. For example, Rajeev Dhawan in his article ‘Figuring out the Judiciary’ (The Hindu 06 August 2004) laments ‘unfortunately, judicial reforms in India are based on intuition and ideas, not on data’. Hence, the focus needs to be on providing database and also on validating the intuition and ideas through data so that the actions proposed ensure that they achieve their set objectives.
While handling data, the following two disciplines that need to be kept in view:
Statistical techniques related to analysis of data as well as of design and analysis of industrial experiments relevant to investigation of a problem as well as improvement studies are covered in next Section E. Statistical aspects of a measurement system are also covered.
Data on thickness of metal strip (mm)
Process is far too inferior to comply with the thickness requirement of 10.00 ± 0.50 mm.