Going shopping - dealing with data inconsistency

We have the following data about the shopping preferences of our friend, Jane:

Temperature	Rain	Shopping
Cold	None	Yes
Warm	None	No
Cold	Strong	Yes
Cold	None	No
Warm	Strong	No
Warm	None	Yes
Cold	None	?

We would like to find out, using the decision trees, whether Jane would go shopping if the outside temperature was cold with no rain.

Analysis:

Here we should be careful, as there are instances of the data that have the same values for the same attributes, but have different classes; that is, (cold,none,yes) and (cold,none,no). The program we made would form the following decision tree:

    Root
    ├── [Temperature=Cold]
    │    ├──[Rain=None]
    │    │    └──[Shopping=Yes]
    │    └──[Rain=Strong]
    │    └──[Shopping=Yes]
    └── [Temperature=Warm]
    ├──[Rain=None]
    │    └──[Shopping=No]
    └── [Rain=Strong]
    └── [Shopping=No]

But at the leaf node [Rain=None] with the parent [Temperature=Cold], there are two data samples with both classes no and yes. We cannot therefore classify an instance (cold,none,?) accurately. For the decision tree algorithm to work better, we would have to either provide a class at the leaf node with the greatest weight - that is, the majority class. Even better would be to collect values for more attributes for the data samples so that we can make a decision more accurately.

Therefore, in the presence of the given data, we are uncertain whether Jane would go shopping or not.

Table of Contents for Going shopping - dealing with data inconsistency

Create new playlist

Sign In

Sign Up

Table of Contents for
Going shopping - dealing with data inconsistency