Gender classification - Bayes for continuous random variables

So far, we have been given a probability event that belonged to one of a finite number of classes, for example, a temperature was classified as cold, warm, or hot. But how would we calculate the posterior probability if we were given the temperature in degrees Celsius instead?

For this example, we are given five men and five women with their heights as in the following table:

Height in cm	Gender
180	Male
174	Male
184	Male
168	Male
178	Male
170	Female
164	Female
155	Female
162	Female
166	Female
172	?

Suppose that the next person has the height 172cm. What gender is that person more likely to be and with what probability?

Analysis:

One approach to solving this problem could be to assign classes to the numerical values, for example, the people with a height between 170 cm and 179 cm would be in the same class. With this approach, we may end up with a few classes that are very wide, for example, with a high cm range, or with classes that are more precise but have fewer members and so the power of Bayes cannot be manifested well. Similarly, using this method, we would not consider that the classes of height intervals in cm [170,180) and [180,190) are closer to each other than the classes [170,180) and [190,200).

Let us remind ourselves of the Bayes' formula here:

P(male|height)=P(height|male)*P(male)/P(height)

=P(height|male)*P(male)/[P(height|male)*P(male)+P(height|female)*P(female)]

Expressing the formula in the final form above removes the need to normalize the P(height|male) and P(height) to get the correct probability of a person being male based on the measured height.

Assuming that the height of people is distributed normally, we could use a normal probability distribution to calculate P(male|height). We assume P(male)=0.5, that is, that it is equally likely that the person to be measured is of either gender. A normal probability distribution is determined by the mean μ and the variance σ² of the population:

Gender	Mean of height	Variance of height
Male	176.8	37.2
Female	163.4	30.8

Thus we could calculate the following:

P(height=172|male)=exp[-(172- 176.8)2/(2*37.2)]/[sqrt(2*37.2*π)]=0

P(height=172|female)=exp[-(172- 163.4)2/(2*30.8)]/[sqrt(2*30.8*π)]=0.02163711333

Note that these are not the probabilities, just the values of the probability density function. However, from these values, we can already observe that a person with a measured height 172 cm is more likely to be male than female because P(height=172|male)>P(height=172|female). To be more precise:

P(male|height=172)=P(height=172|male)*P(male)/[P(height=172|male)*P(male)+P(height=17 2|female)*P(female)]

=0.04798962999*0.5/[0.04798962999*0.5+0.02163711333*0.5]=0.68924134178~68.9%

Therefore, the person with the measured height 172 cm is a male with a probability of 68.9%.

Table of Contents for Gender classification - Bayes for continuous random variables

Create new playlist

Sign In

Sign Up

Table of Contents for
Gender classification - Bayes for continuous random variables