Naming the dataset

We will be using the Breast Cancer dataset. The following list contains the various conventions used in the dataset:

  • ID number 
  • Diagnosis (M = malignant, and B = benign) 
  • 10 real-valued features are computed for each cell nucleus:
    • Radius (mean of the distances from the center to points on the perimeter) 
    • Texture (standard deviation of gray scale values) 
    • Perimeter 
    • Area 
    • Smoothness (local variation in radius lengths) 
    • Compactness (perimeter^2/area - 1.0)
    • Concavity (severity of concave portions of the contour) 
    • Concave points (number of concave portions of the contour) 
    • Symmetry 
    • Fractal dimension (coastline approximation-1)

We will use random forest through Excel, applying the breast cancer dataset, to understand random forest in detail.  We will consider only data elements from 569 sample pieces of data from the breast cancer dataset for the purposes of analysis.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset