Playing chess - dependent events

Suppose that we would like to find out again if our friend would like to play chess in the park with us in a park in Cambridge, UK. But, this time, we are given different input data:

Temperature	Wind	Season	Play
Cold	Strong	Winter	No
Warm	Strong	Autumn	No
Warm	None	Summer	Yes
Hot	None	Spring	No
Hot	Breeze	Autumn	Yes
Warm	Breeze	Spring	Yes
Cold	Breeze	Winter	No
Cold	None	Spring	Yes
Hot	Strong	Summer	Yes
Warm	None	Autumn	Yes
Warm	Strong	Spring	?

So, we wonder how the answer will change with this different data.

Analysis:

We may be tempted to use Bayesian probability to calculate the probability of our friend playing chess with us in the park. However, we should be careful, and ask whether the probability events are independent of each other.

In the previous example, where we used Bayesian probability, we were given the probability variables Temperature, Wind, and Sunshine. These are reasonably independent. Common sense tells us that a specific temperature or sunshine does not have a strong correlation to a specific wind speed. It is true that sunny weather results in higher temperatures, but sunny weather is common even when the temperatures are very low. Hence, we considered even sunshine and temperature reasonably independent as random variables and applied Bayes' theorem.

However, in this example, temperature and season are tightly related, especially in a location such as the UK, where we stated that the park we are interested in was placed. Unlike closer to the equator, temperatures in the UK vary greatly throughout the year. Winters are cold and summers are hot. Spring and fall have temperatures in between.

Therefore, we cannot apply Bayes' theorem here, as the random variables are dependent. However, we could still perform some analysis using Bayes' theorem on the partial data. By eliminating sufficient dependent variables, the remaining ones could turn out to be independent. Since temperature is a more specific variable than season, and the two variables are dependent, let us keep only the temperature variable. The remaining two variables, temperature and wind, are dependent.

Thus, we get the following data:

Temperature	Wind	Play
Cold	Strong	No
Warm	Strong	No
Warm	None	Yes
Hot	None	No
Hot	Breeze	Yes
Warm	Breeze	Yes
Cold	Breeze	No
Cold	None	Yes
Hot	Strong	Yes
Warm	None	Yes
Warm	Strong	?

We can keep the duplicate rows, as they give us greater evidence of the occurrence of the specific data row.

Input:

Saving the table we get the following CSV file:

# source_code/2/chess_reduced.csv
Temperature,Wind,Play
Cold,Strong,No
Warm,Strong,No
Warm,None,Yes
Hot,None,No
Hot,Breeze,Yes
Warm,Breeze,Yes
Cold,Breeze,No
Cold,None,Yes
Hot,Strong,Yes
Warm,None,Yes
Warm,Strong,?

Output:

We input the saved CSV file into the program naive_bayes.py. We get the following result:

python naive_bayes.py chess_reduced.csv
[['Warm', 'Strong', {'Yes': 0.49999999999999994, 'No': 0.5}]]

The first class, Yes, is going to be true with the probability 50%. The numerical difference resulted from using Python's non-exact arithmetic on the float numerical data type. The second class, No, has the same probability, 50%, of being true. We, thus, cannot make a reasonable conclusion with the data that we have about the class of the vector (Warm, Strong). However, we probably have already noticed that this vector already occurs in the table with the resulting class No. Hence, our guess would be that this vector should just happen to exist in one class, No. But, to have greater statistical confidence, we would need more data or more independent variables involved.

Table of Contents for Playing chess - dependent events

Create new playlist

Sign In

Sign Up

Table of Contents for
Playing chess - dependent events