Dataset naming

The name for this dataset is simply Boston. It has two photo tasks: now, in which the nitrous oxide level is to be predicted; and price, in which the median value of a home is to be predicted.

Miscellaneous details about the dataset are as follows:

Origin: The origin of the Boston housing data is Natural.
Usage: This dataset may be used for assessment.
Number of cases: The dataset contains a total of 506 cases.
Order: The order of the cases is mysterious.
Variables: There are 14 attributes in each case of the dataset. They are the following:
- CRIM: Per capita crime rate by town
- ZN: A proportion of residential land zoned for lots over 25,000 sq.ft
- INDUS: A proportion of nonretail business acres per town
- CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise)
- NOX: Nitric oxide concentration (parts per 10 million)
- RM: Average number of rooms per dwelling
- AGE: A proportion of owner-occupied units built prior to 1940
- DIS: Weighted distances to five Boston employment centers
- RAD: Index of accessibility to radial highways
- TAX: Full-value property-tax rate per $10,000
- PTRATIO: Pupil-teacher ratio by a town
- B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by a town
- LSTAT: Percentage lower status of the population
- MEDV: A median value of owner-occupied homes in $1000

We will try out both simple linear regression as well as multivariate regression using Excel for the dataset and understand the details. We will consider only the following 20 data elements from the 506 sample data space from the Boston dataset for our analysis purposes:

Now, we can use the data analysis option given in Excel and try to predict the MV considering the dependent variable DIS alone. In data analysis, select Regression and select the MV as the Y value and DIS as the X value. This is a simple regression with one dependent variable to predict the output. The following is the output produced by Excel:

The linear regression equation for prediction of MV with DIS as the dependent variable would be Y = 1.11X + 17.17 (DIS coefficient of DIS + intercept value):

R2 =0.0250

Now, we can see the predicted output of MV for the set of 20 data samples considered for analysis:

The output chart for the MV predicted for the DIS as a dependent variable is given as follows:

Now, we get an understanding of how linear regression works for a single, dependent variable. In the same way, we can have any number of dependent variables, by including them as X1, X2, X3, ... XN.

In our dataset, we have 14 variables in total and we can have the MV dependent on all the remaining 13 variables and create the regression equation in the same manner as specified previously for a single variable.

Now that we have understood how to perform regression for our Boston dataset using Excel, we will be performing the same using Core ML. Before going ahead and implementing in Core ML, we will must understand what Core ML is and look into the basics of Core ML.

Table of Contents for Dataset naming

Create new playlist

Sign In

Sign Up

Table of Contents for
Dataset naming