Understanding the Boston Housing dataset

In this chapter, we will use six libraries. These libraries are as listed in the following code:

# Libraries
library(keras)
library(mlbench)
library(psych)
library(dplyr)
library(magrittr)
library(neuralnet)

The structure of the BostonHousing data is as follows:

# Data structure
data(BostonHousing)
str(BostonHousing)

OUTPUT
'data.frame': 506 obs. of 14 variables: $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ... $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ... $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ... $ chas : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ... $ rm : num 6.58 6.42 7.18 7 7.15 ... $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ... $ dis : num 4.09 4.97 4.97 6.06 6.06 ... $ rad : num 1 2 2 3 3 3 5 5 5 5 ... $ tax : num 296 242 242 222 222 222 311 311 311 311 ... $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ... $ b : num 397 397 393 395 397 ... $ lstat : num 4.98 9.14 4.03 2.94 5.33 ... $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...

As you can see from the preceding output, this dataset has 506 observations and 14 variables. Out of the 14 variables, 13 are numeric and 1 variable (chas) is of the factor type. The last variable, medv (the median value of owner-occupied homes in thousand-USD units), is the dependent, or target, variable. The remaining 13 variables are independent. The following is a brief description of all the variables, drawn up in a table for easy reference:

Variables Description
crim Per-capita crime rate by town
zn Proportion of residential land zoned for lots over 25,000 sq ft
indus Proportion of nonretail business acres per town
chas Charles River dummy variable (1 if the tract bounds a river; 0 otherwise)
nox Nitric-oxides concentration (parts per 10 million)
rm Average number of rooms per dwelling
age Proportion of owner-occupied units built prior to 1940
dis Weighted distances to five Boston employment centers
rad Index of accessibility to radial highways
tax Full-value property-tax rate per 10,000 USD
ptratio Pupil–teacher ratio by town
lstat Percentage of lower-income status members of the population
medv Median value of owner-occupied homes in thousand-USD units

 

This data is based on the 1970 census. A detailed statistical study using this data was published by Harrison and Rubinfeld in 1978 (reference: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.926.5532&rep=rep1&type=pdf).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset