Getting ready

Let's use some different housing data from the City of Saratoga, CA. This time, we will look at the lot size and house price:

Lot size House price (in $1,000)
12,839 2,405
10,000 2,200
8,040 1,400
13,104 1,800
10,000 2,351
3,049 795
38,768 2,725
16,250 2,150
43,026 2,724
44,431 2,675
40,000 2,930
1,260 870
15,000 2,210
10,032 1,145
12,420 2,419
69,696 2,750
12,600 2,035
10,240 1,150
876 665
8,125 1,430
11,792 1,920
1,512 1,230
1,276 975
67,518 2,400
9,810 1,725
6,324 2,300
12,510 1,700
15,616 1,915
15476 2,278
13,390 2,497.5
1,158 725
2,000 870
2,614 730
13,433 2,050
12,500 3,330
15,750 1,120
13,996 4,100
10,450 1,655
7,500 1,550
12,125 2,100
14,500 2,100
10,000 1,175
10,019 2,047.5
48,787 3,998
53,579 2,688
10,788 2,251
11,865 1,906

Let's convert this data into a comma-separated value (CSV) file called saratoga.csv and draw it as a scatter plot:

Finding the number of clusters is a tricky task. Here, we have the advantage of visual inspection, which is not available for data on hyperplanes (more than three dimensions). Let's roughly divide the data into four clusters as follows:

Run the k-means algorithm to do the same and see how close our results come.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset