Histogram

A histogram is a graphical representation of a numerical distribution, showing the shape of a distribution. It consists of adjacent rectangles (bins), whose bases are aligned on an axis oriented and equipped with a unit of measure (the axis assumes the unit of measure of the character and can be safely understood as the X axis). The adjacency of the rectangles reflects the continuity of the character. Each rectangle has a base length equal to the width of the corresponding class; the height is calculated as a frequency density, so it is equal to the ratio between the frequencies (absolute) associated with the class and the amplitude of the class.

In the MATLAB environment, it is possible to create histograms with the histogram() function, which in the simplest form is written as:

>> histogram(x)

The x parameter represents a vector of numeric values. The x elements are ordered in 10 bins, equidistant along the x axis, between the maximum and minimum values of x. The histogram()  function displays each bin as a rectangle, such that the height of each rectangle indicates the number of elements in the respective bin.

If the input is a matrix, the  histogram() function creates a histogram for each column with different colors. If the input vector is data of the categorical type, each bin represents a category of x.

Let's now look at an example; we will graph a set of values derived from a survey run on a number of users representative of the population. This test provides the results, which we will include as elements of a vector that will represent the argument of the histogram() function:

>> Vect1=[10,25,12,13,33,25,44,50,43,26,38,32,31,28,30];
>> histogram(Vect1)

By not providing any optional arguments, MATLAB has automatically established the number of frequency classes to divide the range of values. In this case, the interval between 10 and 50 (extreme values) was divided into ten classes, as can be seen in the following figure:

Figure 3.12: Histogram of a continuous distribution

Moreover, nothing has been set relative to the title of the chart on the axis labels. In this case, MATLAB has entered the numbers that refer to the central value of each bin for the horizontal axis and the equivalent values for the vertical axis. Let's see, however, what happens when we set the number of bins (rectangles), a title and label for the axes, and finally the color that the rectangles will have to assume:

>> Vect2=[10,25,12,13,33,25,44,50,43,26,38,32,31,28,30,15,16,22,
35,18];
h = histogram(Vect2,12)
h =
Histogram with properties:
Data: [10 25 12 13 33 25 44 50 43 26 38 32 31 28 30]
Values: [1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 2 1 0 1 0 1 1 1 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 1]
NumBins: 41
BinEdges: [1×42 double]
BinWidth: 1
BinLimits: [9.5000 50.5000]
Normalization: 'count'
FaceColor: 'auto'
EdgeColor: [0 0 0]
>> xlabel('Results')
>> ylabel('Frequency')
>> title('Customer Satisfaction Survey')
>> h.FaceColor = [0 0.5 0.5];

In the following figure, a histogram with number of bins set by the user is shown:

Figure 3.13: Histogram with number of bins set by the user

Let's look at the commands just introduced:

>> h = histogram(Vect2,12)

We set the bin number, in our case, 12; the following code lines set the labels for the X and Y axes and the title of the chart:

>> xlabel('Results')
>> ylabel('Frequency')
>> title('Customer Satisfaction Survey')

Finally, to set the color of the rectangles, we attribute to the graphic object identifier the FaceColor property with an RGB value of [0 0.5 0.5], corresponding to green:

>> h.FaceColor = [0 0.5 0.5];

The result is shown in Figure 3.13. Actually, the number of bins in the graph corresponds exactly to the number we specified in the command; this is because the histogram()  function accepts several options. In our case, we added the vector containing the data and the bin number. If more control over the exact bin number is required, we can point to breakpoints between them by using an option, giving it a vector that contains a range of values in which we want to divide the data. Let's look at an example: the same vector we used in the previous examples will be represented by dividing the interval into four bins. To do this, we will define a new vector that will contain the entire range from the minimum value to the maximum value with step ten:

>> Vect3=[10,25,12,13,33,25,44,50,43,26,38,32,31,28,30,15,16,
22,35,18];
>> nbin=10:10:50;
>> h = histogram(Vect3,nbin)
h =
Histogram with properties:
Data: [10 25 12 13 33 25 44 50 43 26 38 32 31 28 30 15 16 22 35 18]
Values: [6 5 6 3]
NumBins: 4
BinEdges: [10 20 30 40 50]
BinWidth: 10
BinLimits: [10 50]
Normalization: 'count'
FaceColor: 'auto'
EdgeColor: [0 0 0]
>> xlabel('Results')
>> ylabel('Frequency')
>> title('Customer Satisfaction Survey')
>> h.FaceColor = [0 0.5 0.5];

Thus, a different breakdown of classes is obtained, as shown in the following figure:

Figure 3.14: Histogram with user-set bin spacing

In data analysis, we are often more interested in the frequency density than frequency. This is because the frequency is related to the size of the sample. So, instead of counting the number of class occurrences of the sample, MATLAB provides probability densities using the Normalization, pdf option. Let us see how to proceed with a simple example; we define a new vector that contains 1000 values automatically and randomly generated by MATLAB, and we plot the relative histogram (Figure 3.15):

>> Vect4 = randn(1000,1);
>> h = histogram(Vect4,'Normalization','pdf')
h =
Histogram with properties:
Data: [1000×1 double]
Values: [1×24 double]
NumBins: 24
BinEdges: [1×25 double]
BinWidth: 0.3000
BinLimits: [-3.3000 3.9000]
Normalization: 'pdf'
FaceColor: 'auto'
EdgeColor: [0 0 0]
Figure 3.15: Probability density of a normalized distribution

From the Figure 5.12, we can note that the Y axis now gives us a probability density measurement that a sample falls into that class. If the breakpoints are equidistant, then the height of each rectangle is proportional to the number of points that fall into the class, and hence the sum of all probability densities is equal to one.

To get more detailed information about the results obtained by applying the histogram() function, we can save this data to a variable as well as use it to plot the chart. We can do this by saving the output of the function in a variable, thus obtaining not only the output of the histogram but also the graph.

A particular type of histogram is what is plotted with the function histfit(); this function plots a histogram using the number of bins equal to the square root of the number of elements in data and fits a normal density function. In the following example, we generate a sample of size 1000 from a normal distribution with mean 50 and variance 3:

>> Vect5 = normrnd(50,3,1000,1);
>> Hist = histfit(Vect5)

In the following figure, a distribution fit curve has been added to a histogram with normal distribution:

Figure 3.16: Histogram with a distribution fit
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset