Categorical array     

Categorical variables are intended to contain data that has values belonging to a finite set of discrete categories. These categories may have a natural order or may be unordered. A variable is categorical unordered if the property to be recorded has discrete non-orderable values. To the categories are assigned values that have no meaning other than to identify a category and distinguish it from others (man, woman). Conversely, a variable is said to be categorical ordered if the property to be recorded has discrete values that can be ordered. In this case, the categories are assigned a value that reflects the order relationships between them (1,2,3,4 ..).

To create a categorical array from a numeric array, a logical array, a cell array, a character array, or an existing categorical array, use the categorical() function. As an example, we will use the data already analyzed in the previous paragraph; in particular, we will refer to the Sex and Age variables, which contain the gender and age of patients in a hospital. Those variables will be converted into categorical arrays by using the commands in the following code:

>> SexC=categorical(Sex);
>> categories(SexC)
ans =
2×1 cell array
'Female'
'Male'
>> AgeC=categorical(Age);
>> categories(AgeC)
ans =
25×1 cell array
'25''27''28''29''30''31''32''33''34''35''36''37''38''39' '40''41''42''43''44''45''46''47''48''49'50'

We notice that the categories of the SexC array are presented in alphabetical order (categorical unordered) while those of AgeC are in ascending order (categorical ordered).

In the example that follows, we will discuss the case of creating an ordinal categorical array by binning type numeric data. In this regard, recall that data binning is a preprocessing technique used to reduce the effects of observation errors. The values of the original data that fall within a given range (bin) will be replaced by a representative value of this range, often identified with the central value. To do this, we will use the Age variable. We will use the discretive() function to create a categorical array by binning Age values. We will divide the range of values (25 50) into three bins (25-33, 33-41, 41-50); each bin will include the left extremity but not the right one. We will also provide a name for each of the three bins so identified:

>> NameBin = {'FirstBin', 'SecondBin', 'ThirdBin'};
>> AgeBin = discretize(Age,[25 33 41 50],'categorical',NameBin);

In this way, we will create a categorical array 100x1 with three categories such that:

FirstBin   <   SecondBin   <   ThirdBin

Use the summary() function to print the number of items in each category:

>> summary(AgeBin)
FirstBin 27
SecondBin 33
ThirdBin 40
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset