Getting ready

Let's use some different housing data from the City of Saratoga, CA. This time, we will look at the lot size and house price:

Lot size	House price (in $1,000)
12,839	2,405
10,000	2,200
8,040	1,400
13,104	1,800
10,000	2,351
3,049	795
38,768	2,725
16,250	2,150
43,026	2,724
44,431	2,675
40,000	2,930
1,260	870
15,000	2,210
10,032	1,145
12,420	2,419
69,696	2,750
12,600	2,035
10,240	1,150
876	665
8,125	1,430
11,792	1,920
1,512	1,230
1,276	975
67,518	2,400
9,810	1,725
6,324	2,300
12,510	1,700
15,616	1,915
15476	2,278
13,390	2,497.5
1,158	725
2,000	870
2,614	730
13,433	2,050
12,500	3,330
15,750	1,120
13,996	4,100
10,450	1,655
7,500	1,550
12,125	2,100
14,500	2,100
10,000	1,175
10,019	2,047.5
48,787	3,998
53,579	2,688
10,788	2,251
11,865	1,906

Let's convert this data into a comma-separated value (CSV) file called saratoga.csv and draw it as a scatter plot:

Finding the number of clusters is a tricky task. Here, we have the advantage of visual inspection, which is not available for data on hyperplanes (more than three dimensions). Let's roughly divide the data into four clusters as follows:

Run the k-means algorithm to do the same and see how close our results come.

Table of Contents for Getting ready

Create new playlist

Sign In

Sign Up

Table of Contents for
Getting ready