image

The regression equations are

y=16+1.6x

image

and

x=1+0.4y

image

Example 6.10: For the following data, find the regression line of y on x:

x 1 2 3 4 5 8 10
y 9 8 10 12 14 16 15

Image

Solution: We have

x y xy x2
1 9 9 1
2 8 16 4
3 10 30 9
4 12 48 16
5 14 70 25
8 16 128 64
10 15 150 100

Image

Total:

xi=33,yi=84,xi2=219,xiyi=451

image

x¯=xin=337=4.714

image

y¯=yin=847=12

image

and

b=nxiyixiyi[nxi2(xi)2]=7(451)(33)(84)7(2119)(33)2=0.867

image

The regression equation of y on x is

(yiy¯)=byx(xix¯)

image

i.e.,

y12=0.867(x4.714)

image

or

y=0.867x+7.9129

image

Example 6.11: From the following data, fit two regression equations by finding actual means (of x and y), i.e., by actual means method.

x 1 2 3 4 5 6 7
y 2 4 7 6 5 6 5

Image

Solution: We change the origin and find the regression equations as follows:

We have

x¯=xin=1+2+3+4+5+6+77=287=4

image

y¯=yin=2+4+7+6+5+6+57=357=5

image
x y X=xx¯image Y=yy¯image X2image Y2image XY
1 2 –3 –3 9 9 9
2 4 –2 –1 4 4 2
3 7 –1 2 1 1 –2
4 6 0 1 0 0 0
5 5 1 0 1 1 0
6 6 2 1 4 4 2
7 5 3 0 9 9 0
28 35 0 0 28 16 11

Image

We have

xi=28,yi=35,=0,Yi=0,Xi2=28,Yi2=16,XiYi=11

image

byx=XiYiXi2=1128=0.3928=0.393(Approximately)

image

bxy=XiYiYi2=1116=0.6875=0.688(Approximately)

image

The regression equation of y on x is

(yiy¯)=byx(xix¯)

image

(y5)=0.393(x4)

image

or

y=0.393x+3.428

image

And the regression equation of x on y is

(xix¯)=bxy(yiy¯)

image

(xi4)=0.688(yi5)

image

or

x=0.688y+0.56

image

The required equations are

y=0.393x+3.428

image

x=0.688y+0.56

image

Example 6.12: From the following results obtain the two regression equations and estimate the yield of crops when the rainfall is 29 cm and the rainfall when the yield is 600 kg.

 Y (yield in kg) X (rainfall in cm)
Mean 508.4 26.7
Standard deviation 36.8 4.6

Coefficient of correlation between yield and rainfall is 0.52.

Solution: We have

x¯=26.7,y¯=508.4

image

σx=4.6,σy=36.8

image

and r=0.52

byx=rσyσx=(0.52)36.84.6=4.16

image

bxy=rσxσy=(0.52)4.636.8=0.065

image

Regression equation of y on x

(yiy¯)=byx(xix¯)

image

i.e.,

y=508.4=4.16(x26.7)

image

or

y=397.328+4.16x

image

when

x=29,

image

we have

y=397.328+4.16(29)=517.968kg

image

Regression equation of x on y

(xix¯)=bxy(yiy¯)

image

or

x26.7=0.065(y508.4)

image

or

x=6.346+0.065x

image

When y=600 kg.

x=6.346+0.065(600)=32.654cm

image

The regression equations are

y=397.328+4.16x

image

x=6.346+0.065y

image

When the rainfall is 29 cm the yield of the crop is 517.968 kg and when the yield is 600 kg the rainfall is 32.654 cm.

Example 6.13: Find the most likely price of a commodity in Bombay corresponding to the price of Rs. 70 at Calcutta from the following:

 Calcutta Bombay
Average prize 65 67
Standard deviation 2.5 3.5

Correlation coefficient between the prices of commodity in the two cities is 0.8.

Solution: we have

x¯=65,y¯=67

image

σx=2.5,σy=3.5

image

and r=0.8image

(y,y¯)=rσyσx(xix¯)=0

image

y67=(0.8).(3.52.5)(x65)

image

y=67+1.12x72.8

image

y=5.8+1.12x

image

when x=70,image

y=5.8+1.12×70=5.8+78.4

image

y=72.60

image

The price of the commodity in Bombay corresponding to Rs. 70 at Calcutta is Rs. 72.60.

Example 6.14: The regression equation calculated from a given set of observations

x=0.4y+6.4

image

and

y=0.6x+4.6

image

Calculate x¯,y¯image and rxy

Solution: We have

x=0.4y+6.4 (6.19)

image (6.19)

and

y=0.6x+4.6 (6.20)

image (6.20)

From Eq. (6.20), we have y=0.6(0.4y+6.4)+4.6image

y=0.24y3.84+4.6

image

0.76y=0.76

image

y=1

image

From Eq. (6.19), we have x=0.4×1+6.4=6.0image

But (x¯,y¯)image in the point of intersection of Eqs. (6.19) and (6.20)

Hence

(x¯,y¯)=(1,6)

image

x¯=1,y¯=6

image

Clearly, Eq. (6.19) in the regression equation x on y and Eq. (6.20) is the regression equation y on x.

We have

byx=0.6,bxy=0.4

image

and

r2=(0.4)(0.6)=0.24

image

r=rxy=±0.24

image

Since byximage and byximage are both negative r=rxyimage is negative

rxy=±0.24

image

Example 6.15: Show that the coefficient of correlation is the Geometric mean (GM) of the coefficient of regression.

Solution: The coefficients of regression are r(σx/σy)image and r(σy/σx)image.

GM of the regression coefficients is

=rσxσyrσyσx=r2=r=correlationofcorrelation

image

Example 6.16: In a partially destroyed laboratory record of an analysis of correlation data, the following results are legible:

Variance of x=9

Regression equation: 8x10y+66=0,40x18y=214image

What were

i. The mean values of x and y

ii. The standard deviation of y

iii. The coefficient of correlation between x and y.

Solution: Variance of x=9

i.e., σx2=9σx=3image

Solving the regression equations

8x10y+66=0 (6.21)

image (6.21)

40x18y=214 (6.22)

image (6.22)

We obtain

x=13,y=17

image

Since the pair of intersection of the regression lines is (x¯,y¯),image we have

(x¯,y¯)=(x,y)=(13,17)

image

Therefore x¯=13,y¯=17image

The regression Eqs. (6.21) and (6.22) can be written as

y=0.8x+6.6 (6.23)

image (6.23)

x=0.45y+5.35 (6.24)

image (6.24)

The regression coefficient of y on x is

rσyσx=0.8 (6.25)

image (6.25)

And the regression coefficient of x on y is

rσxσy=0.45 (6.26)

image (6.26)

Multiplying Eqs. (6.25) and (6.26), we get

r2=0.45×0.8

image

r2=0.36

image

r=0.6

image

Putting the values of r and σximage in Eq. (6.25), we get the value of σyimage as follows:

rσyσx=0.8

image

(0.6)σy3=0.8

image

σy=0.80.2=4

image

Example 6.17: If one of the regression coefficients is greater than unity. Show that the other regression coefficient is less than unity.

Solution: Let one of the regression coefficient, say byx>1image

Then

byx>1byx<1

image

Since,

byx·bxy=r21

image

We have

bxy1byx

image

bxy<1(1byx<1)

image

Example 6.18: Show that the Arithmetic mean of the regression coefficients is greater than the correlation coefficient.

Solution: We have to show that bxy+byx2>rimage

Consider

(σyσx)2>0

image

Clearly

(σyσx)2>0

image

(Since square of two real qualities is always >0)

σx2+σy22σxσy>0

image

or

σx2σyσx+σy2σyσx>2

image

or

σxσy+σyσx>2

image

or

rσxσy+rσyσx>2

image

or

bxy+byx>2r

image

or

bxy+byx2>r

image

Hence proved.

Example 6.19: Given that x=4y+5image and y=kx+4image are two lines of regression. Show that 0k1/4image. If k=1/8image, find the means of the variables and ratio of their variables.

Solution: x=4y+5bxy=4image

y=kx+4byx=k

image

r2=bxybyx=4k

image

but

1r1

image

0r21

image

44k1

image

0k14

image

when k=18image, r2=4·18=12r=0.7071image

y=kx+4

image

or

y=18x+4

image

or

8y=x+32

image

or

8y=4y+5+32

image

or

4y=37y=9.25

image

Now

x=4y+5x=4(9.25)+5

image

or

x=42

image

Therefore (x¯,y¯)=(x,y)image (the point of intersection is (x¯,y¯)image)

=(42,9.25)

image

i.e.,

x¯=42,y¯=9.25

image

Hence

bxybyx=rσxσyrσyσx=σx·σxσy·σy=4(18)=32

image

i.e.,

(σx)2(σy)2=32

image

The ratio of the variances is 32:1.

Multiple correlation is a statistical technique that predicts value of one variable on the basis of two or more other variables.

Regression analysis involves identifying the relationship between a dependent variable and one or more independent variables. In simple linear regression, the model is used to describe the relationship between a single dependent variable y and a single independent variable x. Multiple regression is concerned with the collection and interpretation of quantitative data and the use of probability theory to estimate population parameters multiple correlation coefficients is denoted by R. It is never negative ab=nd there are several ways to compute its value. R lies between 0 and 1, and it tells the only strength of association between the dependent and independent variables.

If R=0 then there is no linear association relationship, if R=1 there is stronger linear association, i.e., perfect linear relationship between x and y. Correlation and regression analysis are related in the sense that both deal with relationships among variables.

6.18 Multilinear Regression

In some cases the value of a variate may not depend only on a single variable. It may happen that there are several variables, which when taken jointly, will serve as a satisfactory basis for estimating the desired variable. If x1,x2,,xkimage represent the independent variables, is the variable which is to be predicted, and represents the regression equation.

y'=a0+a1x1+a2x2++akxk

image

The unknown coefficients a0,a1,a2,...,akimage will be estimated by the method of least squares. To obtain the values of the variables, we have n set of values of (k + 1) variables. Geometrically, the problem is one of finding the equation of the plane which best fits in the sense of least squares of n points in (k+1)th dimension. The normal equations are

na0+a1x1+a2x2++akxk=ya0x1+a1x12+a2x1x2++akx1xk=x1ya0xk+a1x1xk+a2xkx2++akxk2=xky

image

If there are two independent variables say x1 and x2 the normal equations are

na0+a1x1+a2x2=y

image

a0x1+a1x12+a2x1x2=x1y

image

a0x2+a1x1x2+a2x22=x2y

image

And the regression equation is

y=a0+a1x1+a2x2

image

6.19 Uses of Regression Analysis

There are many uses of regression analysis. In many situations, the dependent variable y is such that it cannot be measured directly. In such cases, with the help of some auxiliary variables, which are taken as independent variable in a regression, the value of y is estimated. Regression equation is often used as a prediction equation. Regressional analysis is used in predicting yield of a crop, for different doses of a fertilizer, and in predicting future demand for food. Regression analysis is also used to estimate the height of a person at a given age, by finding the regression of height on age.

Exercise 6.2

1. Heights of fathers and sons are given below in inches:

Height of father 65 66 67 67 68 69 71 73
Height of son 67 68 64 68 72 70 69 70

Image


Form the lines of regression and calculate the average height of the son when the height of father is 67.5 in.

Ans: When father’s height is 67.5 in. the son’s height is 68.19 in.

2. For the following data, determine the regression lines:

x 6 2 10 4 8
y 9 11 5 8 7

Image

Ans: y=11.90.65x,x=16.41.3yimage

3. Find the regression equations for the following data:

Age of husband(x) 36 23 27 28 28 29 30 31 33 35
Age of wife(y) 29 18 20 22 27 21 29 27 29 28

Image

Ans: y=1.739+0.8913x,x=11.25+0.75yimage

4. By the method of least squares find the regression of y and x, find the value of y when x=4, also find the regression equation of x on y, and find the value of x when y=24. Use the table given below:

x 1 3 5 7 9
y 15 18 21 23 22

Image

Ans: y=15.05+0.95ximage, the value of y when x=4 is 18.85. x=12.58+0.88y,image the value of x when y=24 is 8.73.

5. Using the method of least squares find the two regression equation for the data given below:

x 5 10 15 20 25
y 20 40 30 60 50

Image

Ans: y=16+1.6x,x=1+0.4yimage

6. Define regression and find the regression equation of y on x for the data given below:

x 2 6 4 3 2 2 8 4
y 7 2 1 1 2 3 2 6

Image

Ans: y=4.160.3ximage

7. From the following data, obtain the two regression equations:

Sales 91 97 108 121 67 124 51
Purchase 71 75 69 97 70 91 39

Image

Ans: y=15.998+0.607x,x=0.081+1.286yimage

8. Find the equation of regression lines for the following pairs for the variables x and y: (1, 2), (2, 5), (3, 3), (4, 8), (5, 7).

Ans: y=1.1+1.3x,x=0.5+0.5yimage

9. From the following data, find the yield of wheat in kilogram per unit area when the rainfall is 9 in.

 Means SD
Yield of wheat (kg) 10 8
Annual rainfall (in.) 8 2

Ans: 12 kg

10. Show that the regression coefficients are independent of the change of origin but not the change of scale.

11. Given xi=60,yi=40,xi2=4160,yi2=1,xiyi=1150,x=10.image Find the regression equation of x or y and find r.

Ans: x=3.68+0.58y,r=0.37image

12. Using the data given below find the demand when the price of the quantity is Rs.12.50:

 Price Demand
Means 10 35
Standard deviation 2 5


Coefficient of correlation (r)=0.8

Ans: y=15+2x,demand=40,000unitsimage

13. Find the means of xiimage and yiimage, also find the coefficient of correlation given

2yx50=02y2x10=0

image

Ans: x¯=130,y¯=90,r=0.866image

14. From the following data obtain the two regression equations and calculate the correlation coefficient:

X 1 2 3 4 5 6 7 8 9
Y 9 8 10 12 11 13 14 16 15

Image

Ans: x=6.4+0.95y,y=7.25+0.95ximage

15. From the following regression equations: 8X10Y=66,40X18Y=214image
Find

a. Average values of X and Y

b. Correlation coefficient between the two variables X and Y

c. Standard deviation of Y


Ans: (a)13,17(b)byx=0.8,bxy=0.45(c)σy=4image

16. Find the most likely price in Bombay (X), corresponding to the price of Rs.70 at Calcutta (Y) from the following data:

 Bombay Calcutta
Average prize 67 65
Standard deviation 3.5 2.5

Ans: 72.6

17. The following data based on 450 students are given for marks in statistics and economics at a certain examination:

 Statistics Economics
Means marks 40 48
Standard deviation 12 16


Sum of the products of deviation of marks from their respective mean is 42.075.
Give the equations to the two lines of regression.
Estimate the average marks of regression.

Ans: x=22.24+0.37y,y=22+0.65x,y=54.5image

18. From the following information calculate the line of regression of y on x:

 x y
Means 40 60
Standard deviation 10 15
Correlation coefficient 0.7  

Ans: y=1.05x+18image

19. The correlation coefficient between two variables x and y is r=0.6. If σx=1.5,σy=2.0,x¯=10,y¯=20image. Find the regression lines of (a) y on x and (b) x on y.

Ans: x=0.45y+1,y=0.8x+12image

20. The following table gives the age of cars for a certain make and the maintenance costs. Obtain the regression equation for costs related to age. Also estimate maintenance cost for a 10-year-old car.

Age of car (in years) 2 4 6 8
Maintenance cost (in Rs. hundred) 10 20 25 30

Image

Ans: y=5 + 3.25x, Rs.37.50

21. Two brands of tyres are tested for their life and the following results were obtained:

Length of life 20–25 25–30 30–35 35–40 40–45
No. of tyres, X 1 22 64 10 3
No. of tyres, Y 3 21 74 1 1

Image


If consistency is the criterion, which brand of tyres would you prefer?

Ans: Y brand of tyres

22. Calculate the coefficient of correlation between the age of cars and annual maintenance cost and comment

Age of cars (years) 2 4 6 7 8 10 12
Annual maintenance cost (Rs.) 1600 1500 1800 1900 1700 2100 2000

Image

Ans: 0.836, There is a high degree of positive correlation between output age cost of an automotive, of cars, and maintenance cost.

23. Find the coefficient of correlation between output and cost of an automotive factory from the following data:

Output of cars (in 1000s) 3.5 4.2 5.6 6.5 7.0 8.2 8.8 9 9.7 10.0
Cost per car (in 1000s of Rs.) 9.8 9.0 8.8 8.4 8.3 8.2 8.2 8.0 8.0 8.1

Image

Ans: r=−0.938

24. A factory produces two types of tyres. In an experiment in the working life of these tyres, the following results were obtained:

Length of life(in 100 h) 15–17 17–19 19–21 21–23 23–25
Type A 50 110 260 100 80
Type B 40 300 120 80 60

Image


State which type of the tyre is more stable?

Ans: Type A

Partial correlation: Partial correlation is a method used to describe the relationship between two variables whilst taking away the effects of another variable, or several other variables, on this relationship.

A partial correlation coefficient is a measure of the linear dependence of a pair of random variables from a collection of random variables in the case where the influence of the remaining variables are eliminated.

A measure of the strength of association between a dependent variable and an independent variable when the effect of all other independent variables is removed; equal to the square root of the partial coefficient of determination. Multiple correlation is a statistical technique that predicts value of one variable on the basis of two or more other variables. Population parameters multiple correlation coefficient is denoted by R. It is never negative and lies between 0 and 1. R tells the only strength of association between the dependent and independent variables.

If R=0, then there is no linear association relationship; if R=1 there is stronger linear association, i.e., perfect linear relationship between x and y.

Multiple regression involves one continuous criterion (dependent) variable and two or more predictors (independent variables). The equation for a line of best fit is derived in such a way as to minimize the sums of the squared deviations from the line. Although there are multiple predictors, there is only one predicted Y value, and the correlation between the observed and predicted Y values is called Multiple R. The value of Multiple R will range from 0 to 1. In the case of bivariate correlation, a regression analysis will yield a value of Multiple R, i.e., the absolute value of the Pearson product moment correlation coefficient between X and Y. The multiple linear regression equation will take the following general form:

Y=b0+b1X1+b2X2++bkXk

image

The Partial Correlation procedure computes partial correlation coefficients that describes the linear relationship between two variables while controlling for the effects of one or more additional variables. Correlations are measures of linear association. Two variables can be perfectly related, but if the relationship is not linear, a correlation coefficient is not an appropriate statistic for measuring their association.

Partial correlation is the correlation of two variables while controlling for a third or more variables.

Partial correlation is a method used to describe the relationship between two variables whilst taking away the effects of another variable, or several other variables, on this relationship in simple correlation, we measure the strength of the linear relationship between two variables, without taking into consideration of the fact that both these variables may be influenced by a third variable.

Partial correlation aims at eliminating effects of other variables. The partial correlation coefficient of two variables X1 and X2, partialling X3 is denoted by r12.3. It is given by

r12.3=r12r13r231r1321r232

image

where r12 is the correlation coefficient between X1 and X2 ignoring altogether any influence of X3.

r13 is the correlation coefficient between X1 and X3 ignoring altogether any influence of X2 is the correlation coefficient between X2 and X3 ignoring altogether any influence of X1.

r12.3 lies between –1 and 1.

Example 6.20: If r12=0.41, r13=0.71, and r23=0.50 then r12.3=0.09.

Example 6.21: If r12=0.65, r13=0.60, and r23=0.90 then r12.3=0.32.

Probable error: If r denotes the coefficient of correlation, and n denotes the number of pairs of observations, then the probable error (PE) is given by 23[1r2n]image, where [1r2n]image is the standard error. The probable error is also given by PE=0.6745 [1r2n]image.

If rPE6image then r is significant.

Example 6.22: If n=100, r=0.4, then PE=0.6745

[1r2n]=0.6745[1(0.4)2100]=0.06rPE=0.40.06=6.66>6,image hence r is significant.

Example 6.23: If the value of Karl Pearson’s coefficient of correlation r=0.9, PE=0.04, then find n?

Solution: We have PE=0.6745 [1r2n]image

PE=0.6745[1(0.9)2n]n=3.204n=10.287

image

Hence n=10

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset