Gender classification - Bayes for continuous random variables

So far, we have been given a probability event that belonged to one of a finite number of classes, for example, a temperature was classified as cold, warm, or hot. But how would we calculate the posterior probability if we were given the temperature in degrees Celsius instead?

For this example, we are given five men and five women with their heights as in the following table:

Height in cm

Gender

180

Male

174

Male

184

Male

168

Male

178

Male

170

Female

164

Female

155

Female

162

Female

166

Female

172

?

Suppose that the next person has the height 172cm. What gender is that person more likely to be and with what probability?

Analysis:

One approach to solving this problem could be to assign classes to the numerical values, for example, the people with a height between 170 cm and 179 cm would be in the same class. With this approach, we may end up with a few classes that are very wide, for example, with a high cm range, or with classes that are more precise but have fewer members and so the power of Bayes cannot be manifested well. Similarly, using this method, we would not consider that the classes of height intervals in cm [170,180) and [180,190) are closer to each other than the classes [170,180) and [190,200).

Let us remind ourselves of the Bayes' formula here:

P(male|height)=P(height|male)*P(male)/P(height)

=P(height|male)*P(male)/[P(height|male)*P(male)+P(height|female)*P(female)]

Expressing the formula in the final form above removes the need to normalize the P(height|male) and P(height) to get the correct probability of a person being male based on the measured height.

Assuming that the height of people is distributed normally, we could use a normal probability distribution to calculate P(male|height). We assume P(male)=0.5, that is, that it is equally likely that the person to be measured is of either gender. A normal probability distribution is determined by the mean μ and the variance σ2 of the population:

Gender

Mean of height

Variance of height

Male

176.8

37.2

Female

163.4

30.8

Thus we could calculate the following:

P(height=172|male)=exp[-(172- 176.8)2/(2*37.2)]/[sqrt(2*37.2*π)]=0

P(height=172|female)=exp[-(172- 163.4)2/(2*30.8)]/[sqrt(2*30.8*π)]=0.02163711333

Note that these are not the probabilities, just the values of the probability density function. However, from these values, we can already observe that a person with a measured height 172 cm is more likely to be male than female because P(height=172|male)>P(height=172|female). To be more precise:

P(male|height=172)=P(height=172|male)*P(male)/[P(height=172|male)*P(male)+P(height=17 2|female)*P(female)]

=0.04798962999*0.5/[0.04798962999*0.5+0.02163711333*0.5]=0.68924134178~68.9%

Therefore, the person with the measured height 172 cm is a male with a probability of 68.9%.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset