Going shopping - dealing with data inconsistency

We have the following data about the shopping preferences of our friend, Jane:

Temperature

Rain

Shopping

Cold

None

Yes

Warm

None

No

Cold

Strong

Yes

Cold

None

No

Warm

Strong

No

Warm

None

Yes

Cold

None

?

We would like to find out, using the decision trees, whether Jane would go shopping if the outside temperature was cold with no rain.

Analysis:

Here we should be careful, as there are instances of the data that have the same values for the same attributes, but have different classes; that is, (cold,none,yes) and (cold,none,no). The program we made would form the following decision tree:

    Root
    ├── [Temperature=Cold]
    │    ├──[Rain=None]
    │    │    └──[Shopping=Yes]
    │    └──[Rain=Strong]
    │    └──[Shopping=Yes]
    └── [Temperature=Warm]
    ├──[Rain=None]
    │    └──[Shopping=No]
    └── [Rain=Strong]
    └── [Shopping=No]
  

But at the leaf node [Rain=None] with the parent [Temperature=Cold], there are two data samples with both classes no and yes. We cannot therefore classify an instance (cold,none,?) accurately. For the decision tree algorithm to work better, we would have to either provide a class at the leaf node with the greatest weight - that is, the majority class. Even better would be to collect values for more attributes for the data samples so that we can make a decision more accurately.

Therefore, in the presence of the given data, we are uncertain whether Jane would go shopping or not.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset