Getting ready

Let's look at the three features of the housing data of the City of Saratoga, CA, that is, house size, lot size, and price. Using PCA, we will merge the house size and lot size features into one feature, namely z. Let's call this feature z density of a house.

It is worth noting that it is not always possible to give meaning to the new feature created. In this case, it is easy as we have only two features to combine and we can use common sense to combine the effect of the two. In a more practical case, you may have 1,000 features that you are trying to project to 100 features. It may not be possible to give real-life meaning to each one of those 100 features.

In this exercise, we will derive the housing density using PCA and then we will do linear regression to see how this density affects the house price.

There is a preprocessing stage before we delve into PCA: feature scaling. Feature scaling comes into the picture when two features have ranges that are at different scales. Here, house size varies in the range of 800 sq. ft. to 7,000 sq. ft., while the lot size varies between 800 sq. ft. to a few acres.

Why did we not have to do feature scaling before? The answer is that we really did not have to put features on a level-playing field. Gradient descent is another area where feature scaling is very useful.

There are different ways of doing feature scaling:

  • Dividing a feature value with a maximum value that will put every feature in the -1 ≤x ≤1 range
  • Dividing a feature value with the range, that is, maximum value-minimum value
  • Subtracting a feature value by its mean and then dividing it by the range
  • Subtracting a feature value by its mean and then dividing it by the standard deviation

We are going to use the fourth choice to scale in the best way possible. The following is the data we are going to use for this recipe:

House size Lot size Scaled house size Scaled lot size House price (in $1,000)
2,524 12,839 -0.025 -0.231 2,405
2,937 10,000 0.323 -0.4 2,200
1,778 8,040 -0.654 -0.517 1,400
1,242 13,104 -1.105 -0.215 1,800
2,900 10,000 0.291 -0.4 2,351
1,218 3,049 -1.126 -0.814 795
2,722 38,768 0.142 1.312 2,725
2,553 16,250 -0.001 -0.028 2,150
3,681 43,026 0.949 1.566 2,724
3,032 44,431 0.403 1.649 2,675
3,437 40,000 0.744 1.385 2,930
1,680 1,260 -0.736 -0.92 870
2,260 15,000 -0.248 -0.103 2,210
1,660 10,032 -0.753 -0.398 1,145
3,251 12,420 0.587 -0.256 2,419
3,039 69,696 0.409 3.153 2,750
3,401 12,600 0.714 -0.245 2,035
1,620 10,240 -0.787 -0.386 1,150
876 876 -1.414 -0.943 665
1,889 8,125 -0.56 -0.512 1,430
4,406 11,792 1.56 -0.294 1,920
1,885 1,512 -0.564 -0.905 1,230
1,276 1,276 -1.077 -0.92 975
3,053 67,518 0.42 3.023 2,400
2,323 9,810 -0.195 -0.412 1,725
3,139 6,324 0.493 -0.619 2,300
2,293 12,510 -0.22 -0.251 1,700
2,635 15,616 0.068 -0.066 1,915
2,298 15,476 -0.216 -0.074 2,278
2,656 13,390 0.086 -0.198 2,497.5
1,158 1,158 -1.176 -0.927 725
1,511 2,000 -0.879 -0.876 870
1,252 2,614 -1.097 -0.84 730
2,141 13,433 -0.348 -0.196 2,050
3,565 12,500 0.852 -0.251 3,330
1,368 15,750 -0.999 -0.058 1,120
5,726 13,996 2.672 -0.162 4,100
2,563 10,450 0.008 -0.373 1,655
1,551 7,500 -0.845 -0.549 1,550
1,993 12,125 -0.473 -0.274 2,100
2,555 14,500 0.001 -0.132 2,100
1,572 10,000 -0.827 -0.4 1,175
2,764 10,019 0.177 -0.399 2,047.5
7,168 48,787 3.887 1.909 3,998
4,392 53,579 1.548 2.194 2,688
3,096 10,788 0.457 -0.353 2,251
2,003 11,865 -0.464 -0.289 1,906

Let's take the scaled house size and scaled house price data and save it as scaledhousedata.csv.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset