Playing chess - dependent events

Suppose that we would like to find out again if our friend would like to play chess in the park with us in a park in Cambridge, UK. But, this time, we are given different input data:

Temperature

Wind

Season

Play

Cold

Strong

Winter

No

Warm

Strong

Autumn

No

Warm

None

Summer

Yes

Hot

None

Spring

No

Hot

Breeze

Autumn

Yes

Warm

Breeze

Spring

Yes

Cold

Breeze

Winter

No

Cold

None

Spring

Yes

Hot

Strong

Summer

Yes

Warm

None

Autumn

Yes

Warm

Strong

Spring

?

So, we wonder how the answer will change with this different data.

Analysis:

We may be tempted to use Bayesian probability to calculate the probability of our friend playing chess with us in the park. However, we should be careful, and ask whether the probability events are independent of each other.

In the previous example, where we used Bayesian probability, we were given the probability variables Temperature, Wind, and Sunshine. These are reasonably independent. Common sense tells us that a specific temperature or sunshine does not have a strong correlation to a specific wind speed. It is true that sunny weather results in higher temperatures, but sunny weather is common even when the temperatures are very low. Hence, we considered even sunshine and temperature reasonably independent as random variables and applied Bayes' theorem.

However, in this example, temperature and season are tightly related, especially in a location such as the UK, where we stated that the park we are interested in was placed. Unlike closer to the equator, temperatures in the UK vary greatly throughout the year. Winters are cold and summers are hot. Spring and fall have temperatures in between.

Therefore, we cannot apply Bayes' theorem here, as the random variables are dependent. However, we could still perform some analysis using Bayes' theorem on the partial data. By eliminating sufficient dependent variables, the remaining ones could turn out to be independent. Since temperature is a more specific variable than season, and the two variables are dependent, let us keep only the temperature variable. The remaining two variables, temperature and wind, are dependent.

Thus, we get the following data:

Temperature

Wind

Play

Cold

Strong

No

Warm

Strong

No

Warm

None

Yes

Hot

None

No

Hot

Breeze

Yes

Warm

Breeze

Yes

Cold

Breeze

No

Cold

None

Yes

Hot

Strong

Yes

Warm

None

Yes

Warm

Strong

?

We can keep the duplicate rows, as they give us greater evidence of the occurrence of the specific data row.

Input:

Saving the table we get the following CSV file:

# source_code/2/chess_reduced.csv
Temperature,Wind,Play
Cold,Strong,No
Warm,Strong,No
Warm,None,Yes
Hot,None,No
Hot,Breeze,Yes
Warm,Breeze,Yes
Cold,Breeze,No
Cold,None,Yes
Hot,Strong,Yes
Warm,None,Yes
Warm,Strong,?

Output:

We input the saved CSV file into the program naive_bayes.py. We get the following result:

python naive_bayes.py chess_reduced.csv
[['Warm', 'Strong', {'Yes': 0.49999999999999994, 'No': 0.5}]]

The first class, Yes, is going to be true with the probability 50%. The numerical difference resulted from using Python's non-exact arithmetic on the float numerical data type. The second class, No, has the same probability, 50%, of being true. We, thus, cannot make a reasonable conclusion with the data that we have about the class of the vector (Warm, Strong). However, we probably have already noticed that this vector already occurs in the table with the resulting class No. Hence, our guess would be that this vector should just happen to exist in one class, No. But, to have greater statistical confidence, we would need more data or more independent variables involved.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset