2
PRTools Introduction

2.1 Motivation

Scientists should build their own instruments, or at least be able to open, investigate and understand the tools they are using. If, however, the tools are provided as a black box there should be a manual or literature available that fully explains the ins and outs. In principle, scientists should be able to create their measurement devices from scratch; otherwise the progress in science has no foundations.

In statistical pattern recognition one studies techniques for the generalization of examples to decision rules to be used for the detection and recognition of patterns in experimental data. This research area has a strong computational character, demanding a flexible use of numerical programs for data analysis as well as for the evaluation of the procedures. As still new methods are being proposed in the literature a programming platform is needed that enables a fast and flexible implementation.

Matlab® is the dominant programming language for implementing numerical computations and is widely used for algorithm development, simulation, data reduction, and testing and system evaluation. Pattern recognition is studied in almost all areas of applied science. Thereby the use of a widely available numerical toolset like Matlab® may be profitable for both the use of existing techniques as well as for the study of new algorithms. Moreover, because of its general nature in comparison with more specialized statistical environments, it offers an easy integration with the pre-processing of data of any nature. This may certainly be facilitated by the large set of toolboxes available in Matlab®.

PRTools is a Matlab® toolbox designed by Robert P.W. Duin at first for pattern recognition research. The pattern recognition routines and support functions offered by PRTools represent a basic set covering largely the area of statistical pattern recognition. With the help of researchers in many areas, PRTools has updated to version 5 and can work well with the simultaneous use of the Matlab® Statistical Toolbox Stats and integrates a number of its classifiers. In this book, except for additional notes, all examples are based on PRTools5.

PRTools has been used in many courses and PhD projects and received hundreds of citations. It is especially useful for researchers and engineers who need a complete package for prototyping recognition systems as it includes tools for representation. It offers most traditional and state-of-the-art off-the-shelf procedures for transformations and classification and evaluation. Thereby, it is well suited for comparative studies.

The notation used in PRTools manual documentation and code differs slightly from that used in the code throughout this book. In this chapter we try to follow the notation in the book. In Table 2.1 notation differences between this book and the PRTools documentation are given.

Table 2.1 Notation differences between this book and the PRTools documentation

Mathematical Notation in PRTools
notation pseudo-code notation Meaning
T x, z a, b data set
n n m number of objects
N, D N, D k, n number of features, dimensions
K K c number of classes

2.2 Essential Concepts

For the automatic recognition of the classes of objects, first some measurements have to be collected, for example using sensors, then they have to be represented, for example in a feature space, and after some possible feature reduction steps they can be finally mapped by a classifier on the set of class labels. Between the initial representation in the feature space and this final mapping on the set of class labels the representation may be changed several times: simplified feature spaces (feature selection), normalization of features (e.g. by scaling), linear or non-linear mappings (feature extraction) and classification by a possible set of classifiers, combining classifiers and the final labelling. In each of these steps the data are transformed by some mapping. Based on this observation the following two basic concepts of PRTools are defined:

  • Datasets: matrices in which the rows represent the objects and the columns the features, class memberships or other fixed sets of properties (e.g. distances to a fixed set of other objects). In PRTools4 and the later version an extension of the dataset concept has been defined as Datafiles, which refer to datasets to be created from directories of files.
  • Mappings: transformations operating on datasets. As pattern recognition has two stages, training and execution, mappings have also two types, untrained and trained.

An untrained mapping refers just to the concept of a method, for example forward feature selection, PCA (refer to Chapter 7 of this book). It may have some parameters that are needed for training, for example the desired number of features or some regularization parameters. If an untrained mapping is applied to a dataset it will be trained (training).

A trained mapping is specific for the training set used to train the mapping. This dataset thereby determines the input dimensionality (e.g. the number of input features) as well as the output dimensionality (e.g. the number of output features or the number of classes). When a trained mapping is applied to a dataset it will transform the dataset according to its definition (execution).

In addition fixed mappings are used. They are almost identical to trained mappings, except that they do not result from a training step, but are directly defined by the user: for example the transformation of distances by a sigmoid function to the [0, 1] interval. PRTools deals with sets of labelled or unlabelled objects and offers routines for the generalization of such sets into functions for mapping and classification. A classifier is thereby a special case of a mapping as it maps objects on class labels or on [0, 1] intervals that may be interpreted as class memberships, soft labels or posterior probabilities. An object is a k-dimensional vector of feature values, distances, (dis)similarities or class memberships. Within PRTools they are usually just called features. It is assumed that for all objects in a problem all values of the same set of features are given. The space defined by the actual set of features is called the feature space. Objects are represented as points or vectors in this space. New objects in a feature space are usually gradually converted to labels by a series of mappings followed by a final classifier.

Sets of objects may be given externally or may be generated by one of the data generation routines of PRTools. Their labels may also be given externally or may be the result of a cluster analysis. By these technique similar objects within a larger set are grouped (clustered). The similarity measure is defined by the cluster technique in combination with the object representation in the feature space. Some clustering procedures do not just generate labels but also a classifier that classifies new objects in the same way. A fundamental problem is to find a good distance measure that agrees with the dissimilarity of the objects represented by the feature vectors. Throughout PRTools the Euclidean distance is used as a default. However, scaling the features and transforming the feature spaces by different types of mappings effectively changes the distance measure.

The dimensionality of the feature space may be reduced by the selection of subsets of good features. Several strategies and criteria are possible for searching good subsets. Feature selection is important because it decreases the amount of features that have to be measured and processed. In addition to the improved computational speed in lower dimensional feature spaces there might also be an increase in the accuracy of the classification algorithms. Another way to reduce the dimensionality is to map the data on a linear or non-linear subspace. This is called linear or non-linear feature extraction. It does not necessarily reduce the number of features to be measured, but the advantage of an increased accuracy may still be gained. Moreover, as lower dimensional representations yield less complex classifiers better generalizations can be obtained.

Using a training set a classifier can be trained such that it generalizes this set of examples of labelled objects into a classification rule. Such a classifier can be linear or non-linear and can be based on two different kinds of strategies. The first strategy minimizes the expected classification error by using estimates of the probability density functions. In the second strategy this error is minimized directly by optimizing the classification function of its performance over the learning set or a separate evaluation set. In this approach it has to be avoided because the classifier becomes entirely adapted to the training set, including its noise. This decreases its generalization capability. This ‘overtraining’ can be circumvented by several types of regularization(often used in neural network training). Another technique is to simplify the classification function afterwards (e.g. the pruning of decision trees).

In PRTools4 and the later version the possibility of an automatic optimization has been introduced for parameters controlling the complexity or the regularization of the training procedures of mappings and classifiers. This is based on a cross validation (see below) over the training set and roughly increases the time needed for training by a factor of 100. Constructed classification functions may be evaluated by independent test sets of labelled objects. These objects have to be excluded from the training set, otherwise the evaluation becomes optimistically biased. If they are added to the training set, however, better classification functions can be expected. A solution to this dilemma is the use of cross validation and rotation methods by which a small fraction of objects is excluded from training and used for testing. This fraction is rotated over the available set of objects and results are averaged. The extreme case is the leave-one-out method for which the excluded fraction is as large as one object.

The performance of classification functions can be improved by the following methods:

  1. A reject option in which the objects close to the decision boundary are not classified. They are rejected and might be classified by hand or by another classifier.
  2. The selection or averaging of classifiers.
  3. A multistage classifier for combining classification results of several other classifiers.

For all these methods it is profitable or necessary that a classifier yields some distance measure, confidence or posterior probability in addition to the hard, unambiguous assignment of labels.

2.3 PRTools Organization Structure and Implementation

PRTools makes use of the possibility offered by Matlab® to define ‘Classes’ and ‘Objects’. These programming concepts should not be confused with the classes and objects as defined in pattern recognition. The two main ‘Classes’ defined in PRTools are: dataset and mapping. As a child of dataset datafile has also been defined, inheriting most properties of dataset. A large number of operators (like * or []) and Matlab® commands have been overloaded and have thereby a special meaning when applied to a dataset and/or a mapping.

The central data structure of PRTools is the dataset. It primarily consists of a set of objects represented by a matrix of feature vectors. Attached to this matrix is a set of labels, one for each object and a set of feature names, also called feature labels. Labels can be integer numbers or character strings. Moreover, a set of prior probabilities, one for each class, is stored. In most help files of PRTools, a dataset is denoted by A. In almost any routine this is one of the inputs. Almost all routines can handle multiclass object sets. It is possible that for some objects no label is specified (an NaN is used, or an empty string). Such objects are, unless otherwise mentioned, skipped during training. It is possible to define more than one set of labels in a dataset. For instance, when the objects are pixels in an image, then they may be labelled according to their image segment, but also according to the image, or to the sensor used, or the place the image has been measured.

Data structures of the ‘Classes’ mapping store data transformations (‘mappings’), classifiers, feature extracting results, data scaling definitions, non-linear projections, etc. They are usually denoted by W.

The easiest way to apply a mapping W to a dataset A is by A*W. The matrix multiplication symbol * is overloaded to this purpose. This operation may also be written as map (A, W). Like everywhere else in Matlab®, concatenations of operations are possible, for example A*W1*W2*W3, and are executed from left to right.

In the beginning of the pattern recognition chain (see Figure 2.1), in the constitution of feature vectors, mappings can also be used. Raw data such as images may be stored on disk, and by image processing and image analysis properties may be measured that can be used as features. The definition of raw data items is enabled in PRTools by the programming class called datafile. Datafiles are a type of a pre-stage of datasets. By mappings defining the proper pre-processing a dataset may be constructed. By following mappings, classifiers can be trained and applied, resulting in an overall recognition system.

images

Figure 2.1 The workflow of PRTools.

Figure 2.1 is the workflow to use PRTools for pattern recognition problems. The total recognition workflow in PRTools terms consists of the following steps:

1. Collect raw data on disk
2. Define a datafile such as A pointing to the raw data
3. Define a mapping such as W_prepproc for an appropriate
   preprocessing and analyzing the datafile
4. Apply the mapping to the datafile, resulting in a dataset,
   B = A*W_preproc
5. Define a suited conversion of the feature space, e.g. by PCA:
   W_featred
6. Apply this mapping on B : C = B*W_featred
7. Train a classifier in this space: W_classf
8. Apply the dataset to this classifier:
   labels = C*W_classf
          = B*W_featred*W_classf
          = A*W_preproc*W_featred*W_classf.

As the mappings W_preproc, W_featred and W_classf are stored in variables and as the concatenations of a sequence of mappings is defined in PRTools, the entire recognition system can be stored in a single variable: W_recsys=W_preproc*W_featred*W_classf. New objects, for example images stored on disk as a datafile A, can now be classified by labels=A*W_recsys.

In this example three mappings have to be specified by the user. The first, W_preproc, is usually entirely based on the background knowledge of the user of the type of images he or she wants to classify. The other two, the feature reduction and the classifier, have to be derived from data based on an optimization of a cost function or an estimation of parameters given a model assumption. In pattern recognition terms, these mappings are thereby the result from training. Datasets are needed for this, based on the same pre-processing and representation of the data to be classified later. There are many routines in PRTools available for training mappings and classifiers. It is in fact the core of the toolbox.

Consequently, we distinguish two sets of objects: a training set with given labels (class memberships) to be used for designing the system and an unlabelled set for which the class memberships have to be found. The first step of the program is the definition of these sets such that they can be handled by PRTools. Let us assume that the raw data has been stored in two directories, ‘directory_1’ and ‘directory_2’:

A_labeled = datafile('directory_1');
A_unlabeled = datafile('directory_2');

It will be described later how the labels of A_labeled have to be supplied and how they are stored. The first mapping has to define features for objects. A simple command is the use of histograms, which can be specified by the following mapping:

W_preproc = histm([],[1:256]);

The pre-processing of the two datafiles and their conversion to datasets is performed by

B_labeled = dataset(A_labeled*W_preproc);
B_unlabeled = dataset(A_unlabeled*W_preproc);

Let us assume that a feature reduction by PCA is demanded to five features. It has to be derived from the pre-processed data, of course:

W_featred = pca(B_labeled,5);

Suppose that finally the Fisher classifier is used. It has to be found in the reduced feature space:

W_classf = = fisherc(B_labeled*W_preproc*W_featred);

The labels for B_unlabeled can now be estimated by

labels = B_unlabeled*W_preproc*W_featred*W_classf*labeld;

in which labeld is a standard PRTools mapping that maps classifier outcomes to labels. The classification system can also be stored in a single variable W_class_sys:

W_class_sys = W_preproc*W_featred*W_classf*labeld;
labels = B_unlabeled*W_class_sys;

2.4 Some Details about PRTools

2.4.1 Datasets

Datasets in PRTools are in the Matlab® language defined as objects of the class DATASET. Below, the words ‘object’ and ‘class’ are used in the pattern recognition sense. A dataset is a set consisting of M objects, each described by K features. In PRTools, such a dataset is represented by an M × K matrix: M rows, each containing an object vector of K elements. Usually, a dataset is labelled. An example of a definition is

DATA = [RAND(3,2) ; RAND(3,2)+0.5];
LABS = ['A';'A';'A';'B';'B';'B'];
A = DATASET(DATA,LABS);

which defines a [6 × 2] dataset with 2 classes. The [6 × 2] data matrix (6 objects given by 2 features) is accompanied by labels, assigning each of the objects to one of the two classes A and B. Class labels can be numbers or strings and should always be given as rows in the label list. A label may also have the value NaN or may be an empty string, indicating an unlabelled object. If the label list is not given, all objects are marked as unlabelled. Various other types of information can be stored in a dataset. The simplest way to get an overview is by typing:

STRUCT(A)
which for the above example displays the following:
DATA: [6x2 double]
LABLIST: {2x4 cell}
NLAB: [6x1 double]
LABTYPE: 'crisp'
TARGETS: []
FEATLAB: [2x1 double]
FEATDOM: {1x2 cell }
PRIOR: []
COST: []
OBJSIZE: 6
FEATSIZE: 2
IDENT: [6x1 struct]
VERSION: {[1x1 struct] '21-Jul-2007 15:16:57'}
NAME: []
USER: []

These fields have the following meaning:

Filed name Description
DATA An array containing the objects (the rows) represented by features (the columns). In the software and help-files, the number of objects is usually denoted by M and the number of features is denoted by K. Therefore, DATA has the size of [M,K]. This is also defined as the size of the entire dataset.
LABLIST The names of the classes, can be strings stored in a character array. If they are numeric they are stored in a column vector. Mixtures of these are not supported. The LABLIST field is a structure in which more than a single label list and the corresponding priors and costs are stored. PRTools automatically keeps track of this.
NLAB An [M x 1] vector of integers between 1 and C, defining for each of the M objects its class.
LABTYPE ‘CRISP’, ‘SOFT’ or ‘TARGETS’ are the three possible label types. In case of ‘CRISP’ labels, a unique class, defined by NLAB, is assigned to each object, pointing to the class names given in LABLIST. For ‘SOFT’ labels, each object has a corresponding vector of C numbers between 0 and 1 indicating its membership (or confidence or posterior probability) of each of the C classes. These numbers are stored in TARGETS of the size M x C. They do not necessarily sum to one for individual row vectors. Labels of type ‘TARGETS’ are in fact no labels, but merely target vectors of length C. The values are again stored in TARGETS and are not restricted in value.
TARGETS [M, C] array storing the values of the soft labels or targets.
FEATLAB A label list (like LABLIST) of K rows storing the names of the features.
FEATDOM A cell array describing for each feature its domain.
PRIOR Vector of length C storing the class prior probabilities. They should sum to one. If PRIOR is empty ([]) it is assumed that the class prior probabilities correspond to the class frequencies.
COST Classification cost matrix. COST(I,J) are the costs of classifying an object from class I as class J. Column C+1 generates an alternative reject class and may be omitted, yielding a size of [C,C]. An empty cost matrix, COST=[] (default) is interpreted as COST=ONES(C)-EYE(C)(identical costs of misclassification).
OBJSIZE The number of objects, M. In case the objects are related to an n-dimensional structure, OBJSIZE is a vector of length n, storing the size of this structure. For instance, if the objects are pixels in a [20 x 16] image, then OBJSIZE = [20, 16] and M = 320.
FEATSIZE The number of features, K. In case the features are related to an n-dimensional structure, FEATSIZE is a vector of length n, storing the size of this structure. For instance, if the features are pixels in a [20 x 16] image, then FEATSIZE = [20, 16] and K = 320.
IDENT A structure array of M elements storing user defined fields giving additional information on each of the objects.
VERSION Some information related to the version of PRTools used for defining the dataset.
NAME A character string naming the dataset, possibly used to annotate related graphics.
USER A structure with user defined fields not used by PRTools.

The fields can be set in the following ways:

  1. In the DATASET construction command after DATA and LABELS using the form of {field name, value pairs}, for example
    • A = DATASET (DATA, LABELS,‘PRIOR’, [0.4 0.6],
    • ‘FEATLIST’,[‘AA’;‘BB’]);
    Note that the elements in PRIOR refer to classes as they are ordered in LABLIST.
  2. For a given dataset A, the fields may be changed similarly by the SET command:
    • A = SET (A,‘PRIOR’,[0.4 0.6],‘FEATLIST’,[‘AA’;‘BB’]);
  3. By the commands
    • SETDATA, SETFEATLAB, SETFEATDOM, SETFEATSIZE,
    • SETIDENT, SETLABELS, SETLABLIST, SETLABTYPE,
    • SETNAME, SETNLAB, SETOBJSIZE, SETPRIOR,
    • SETTARGETS, SETUSER.
  4. By using the dot extension as for structures, for example
    • A.PRIOR = [0.4 0.6];
    • A.FEATLIST = [‘AA’;‘BB’];
    Note that there is no field LABELS in the DATASET definition. Labels are converted to NLAB and LABLIST. Commands like SETLABELS and A.LABELS, however, exist and take care of the conversion.

The data and information stored in a dataset can be retrieved as follows:

  1. By DOUBLE(A) and by +A, the content of the A.DATA is returned.
    • [N,LABLIST] = CLASSSIZES(A);
    It returns the numbers of objects per class and the class names stored in LABLIST. By DISPLAY(A), it writes the size of the dataset, the number of classes and the label type on the terminal screen. By SIZE(A), it returns the size of A.DATA: numbers of objects and features. By SCATTERD(A), it makes a scatter plot of a dataset. By SHOW(A), it may be used to display images that are stored as features or as objects in a dataset.
  2. By the GET command, for example
    • [PRIOR, FEATLIST] = GET(A,‘PRIOR’,‘FEATLIST’);
  3. By the commands:
    • GETDATA, GETFEATLAB, GETFEATSIZE, GETIDENT,
    • GETLABELS, GETLABLIST, GETLABTYPE, GETNAME,
    • GETNLAB, GETOBJSIZE, GETPRIOR, GETCOST,
    • GETSIZE, GETTARGETS, GETTARGETS, GETUSER,
    • GETVERSION.
    Note that GETSIZE(A) does not refer to a single field, but it returns [M,K,C]. The following commands do not return the data itself, instead they return indices to objects that have specific identifiers, labels or class indices:
    • FINDIDENT, FINDLABELS, FINDNLAB.
  4. Using the dot extension as for structures, for example
    • PRIOR = A.PRIOR;
    • FEATLIST = A.FEATLIST;
    Many standard Matlab® operations and a number of general Matlab® commands have been overloaded for variables of the DATASET type.

2.4.2 Datafiles

Datafiles are constructed to solve the memory problem connected with datasets. The latter are always in the core and their size is thereby restricted to the size of the computer memory. As in processing datasets copies are often (temporarily) created, it is in practice advisable to keep datasets under 10 million elements (objectsize × featuresize). A number of operations handle datasets sequentially or can be written like that, for example fixed mappings, testing and the training of some simple classifiers. Thus there is no problem to have such data stored on disk and have it read when needed. The datafile construct enables this by allowing an administration to keep track of the data and to have all additional information ready to reshape the desired pieces of data into a dataset when it has to be processed.

In understanding the handling of datafiles it is important to keep this characteristic in mind: they are just administration and the real processing is postponed until it is needed. PRTools makes use of this characteristic to integrate in the datafile concept raw pre-processing of, for instance, images and signals. Arbitrary pre-processing of images can be defined for datafiles. It is the responsibility of the user that a pre-processing sequence ends in a format that can be straightforwardly transformed into a dataset. For example, after pre-processing all images have to be of the same size or generate feature vectors of the same length.

Datafiles are formally children of the ‘class’ dataset. Consequently they inherit the properties of datasets and ideally every operation that can be performed on a dataset can also be performed on a datafile. As can be understood from the above, this cannot always be true due to memory restrictions. There is a general routine, prmemory, by which the user can define what the maximum dataset sizes are that can be allowed. PRTools makes use of this to split datafiles into datasets of feasible sizes. An error is generated when this is not possible.

Routines that do not accept datafiles should return understandable error messages if called with a datafile. In most cases help text is included when routines accept datafiles.

The use of datafiles starts by the datafile construct that points to a directory in which each file is interpreted as an object. There is a savedatafile command that executes all processing defined for a datafile and stores the data on disk, ready to be used for defining a new datafile. This is the only place where PRTools writes data to disk. For more details, read the next section.

2.4.3 Datafiles Help Information

Datafiles in PRTools are in Matlab® language defined as objects of the class DATAFILE. They inherit most of their properties from the class DATASET. They are a generalisation of this class allowing for large datasets distributed over a set of files. Before conversion to a dataset pre-processing can be defined. There are four types of datafiles:

Datafiles type Description
raw Every file is interpreted as a single object in the dataset. These files may, for instance, be images of different size.
pre-cooked In this case the user should supply a command that reads a file and converts it to a dataset.
half-baked All files should be mat-files, containing a single dataset.
mature This is a datafile by PRTools, using the SAVEDATAFILE command after execution of all pre-processing defined for the datafile.

A datafile is, like a dataset, a set consisting of M objects, each described by K features. K might be unknown, in which case it is set to zero, K=0. Datafiles store an administration about the files or directories in which the objects are stored. In addition they can store commands to pre-process the files before they are converted to a dataset and post-processing commands, to be executed after conversion to a dataset.

Datafiles are mainly an administration. Operations on datafiles are possible as long as they can be stored (e.g. filtering of images for raw datafiles or object selection by GENDAT). Commands that are able to process objects sequentially, like NMC and TESTC, can be executed on datafiles.

Whenever a raw datafile is sufficiently defined by pre- and post-processing it can be converted into a dataset. If this is still a large dataset, not suitable for the available memory, it should be stored by the SAVEDATAFILE command and is ready for later use. If the dataset is sufficiently small it can be directly converted into a dataset by DATASET.

The main commands specific for datafiles are:

Command name Function description
DATAFILE Constructor. It defines a datafile on a directory.
ADDPREPROC Adds preprocessing commands (low level command).
ADDPOSTPROC Adds postprocessing commands (low level command).
FILTM User interface to add preprocessing to a datafile.
SAVEDATAFILE Executes all defined pre- and postprocessing and stores the result as a dataset in a set of matfiles.
DATASET Conversion to dataset.

Datafiles have the following fields, in addition to all dataset fields:

Field name Function description
ROOTPATH Absolute path of the datafile
FILES Names of directories (for raw datafiles) or mat-files (for converted datafiles)
TYPE Datafile type
PREPROC Pre-processing commands in a struct array
POSTPROC Post-processing commands as mappings
DATASET Stores all dataset fields.

Almost all operations defined for datasets are also defined for datafiles, with a few exceptions. Fixed and trained mappings can also handle datafiles as they process objects sequentially. The use of untrained mappings in combination with datafiles is a problem, as they have to be adapted to the sequential use of the objects. Mappings that can handle datafiles are indicated in the Contents file.

The possibility to define pre-processing of objects (e.g. images) with different sizes makes datafiles useful for handling raw data and measurements of features.

2.4.4 Classifiers and Mappings

There are many commands to train and use mappings between spaces of different (or equal) dimensionalities. For example:

If A is an m by k dataset (m objects in a k-dimensional space)

and W is a k by n mapping (map from k to n dimensions)

then A*W is an m by n dataset (m objects in a n-dimensional space).

Mappings can be linear or affine (e.g. a rotation and a shift) as well as non-linear (e.g. a neural network). Typically they can be used as classifiers. In that case a k by n mapping maps a k-feature data vector on the output space of a n-class classifier (an exception is where 2-class classifiers such as discriminant functions may be implemented by a mapping to a one-dimensional space like the distance to the discriminant, n=1).

Mappings are of the data type ‘mapping’ (class(W) is ‘mapping’), have a size of [k,n] if they map from k to n dimensions. Mappings can be instructed to assign labels to the output columns, for example the class names. These labels can be retrieved by

labels = getlabels(W);  before the mapping or

labels = getlabels(A*W);  after the dataset A is mapped by W.

Mappings can be learned from examples, (labelled) objects stored in a dataset A, for instance by training a classifier:

W1 = ldc(A);  the normal densities based linear classifier

W2 = knnc(A,3);  the 3-nearest neighbour rule

W3 = svc(A,’p’,2);  the support vector classifier based on a

second-order polynomial kernel

Untrained or empty mappings are supported. They may be very useful. In this case the dataset is replaced by an empty set or entirely skipped:

V1 = ldc; V2 = knnc([],a); V3 = svc([],’p’,2);

Such mappings can be trained later by

W1 = A*V1; W2 = A*V2; W3 = A*V3;

(which is equivalent to the statements a few lines above) or by using cell arrays

V = {ldc, knnc([],a), svc([],’p’,2)}; W = A*V;

The mapping of a test set B by B*W1 is now equivalent to B*(A*V1). Note that expressions are evaluated from left to right, so B*A*V1 will result in an error as the multiplication of the two datasets (B*A) is executed first.

Some trainable mappings do not depend on class labels and can be interpreted as finding a feature space that approximates as much as possible the original dataset given some conditions and measures. Examples are the Karhunen–Loève mapping (klm), principle component analysis (pca) and kernel mapping (kernelm) by which non-linear, kernel PCA mappings can be computed.

In addition to trainable mappings, there are fixed mappings, an operation that is not computed from a training set but defined by just a few parameters. A number of them can be set by cmapm. Other ones are sigm and invsigm.

The result D of mapping a test set on a trained classifier, D=B*W1, is again a dataset, storing for each object in B the output values of the classifier. For discriminants they are sigmoids of distances, mapped on the [0, 1] interval, for neural networks their unnormalized outputs and for density-based classifiers the densities. For all of them, the larger, the more similar they are with the corresponding class. The values in a single row (object) do not necessarily sum to one. Normalization can be achieved by the fixed mapping classc:

D = B*W1*classc

The values in D can be interpreted as posterior probability estimates or classification confidences. Such a classification dataset has column labels (feature labels) for the classes and row labels for the objects. The class labels of the maximum values in each object row can be retrieved by

labels = D*labeld; or labels = labeld(D);

A global classification error follows from

e = D*testc; or e = testc(D);

2.4.5 Mappings Help Information

Mappings in PRTools are in the Matlab® language defined as objects of the class MAPPING. In the text below, the words ‘object’ and ‘class’ are used in the pattern recognition sense.

In the Pattern Recognition Toolbox PRTools, there are many commands to define, train and use mappings between spaces of different (or equal) dimensionalities. Mappings operate mainly on datasets, that is variables of the type DATASET (see also DATASETS) and generate datasets and/or other mappings. For example, if A is an M × K dataset (M objects in a K-dimensional space) and W is an K × N mapping (a map from K to N dimensions) then A*W is an M × N dataset (M objects in an N-dimensional space). This is enabled by overloading the *-operator for the MAPPING variables. A*W is executed by MAP (A,W) and may also be called as such.

Mappings can be linear (e.g. a rotation) as well as non-linear (e.g. a neural network). Typically they are used to represent classifiers. In that case, a K × C mapping maps a K-feature data vector on the output space of a C-class classifier (an exception occurs when some 2-class classifiers, like the discriminant functions, may be implemented by a mapping on to a one-dimensional space determined by the distance to the discriminant).

Mappings are of the data-type MAPPING (CLASS(W) is a MAPPING) and have a size of K x C if they map from K to C dimensions. Four types of mapping are defined:

-untrained, V = A*W

This trains the untrained mapping W, resulting in the trained mapping V. W has to be defined by W=MAPPING(MAPPING_ FILE,{PAR1,PAR2}), in which MAPPING_FILE is the name of the routine that executes the training and PAR1 and PAR2 are two parameters that have to be included into the call to the MAPPING_FILE. Consequently, A*W is executed by PRTools as MAPPING_FILE(A,PAR1,PAR2).

Example: train the 3-NN classifier on the generated data
W = knnc([],3); % untrained classifier
V = gendatd([50 50])*W; % trained classifier
 
- trained, D = B*V

This maps the dataset B on the trained mapping or classifier V, for example as trained above. The resulting dataset D has as many objects (rows) as A, but its feature size is now C if V is a K x C mapping. Typically, C is the number of classes in the training set A or a reduced number of features determined by the training of V. V is defined by V = MAPPING(MAPPING_ FILE,’trained’,DATA,LABELS,SIZE_IN,SIZE_OUT), in which the MAPPING_FILE is the name of the routine that executes the mapping, DATA is a field in which the parameters are stored (e.g. weights) for the mapping execution, LABELS are the feature labels to be assigned to the resulting dataset D = B*V (e.g. the class names) and SIZE_IN and SIZE_OUT are the dimensionalities of the input and output spaces. They are used for error checking only. D = B*V is executed by PRTools as MAPPING_FILE(B,W).

Example: 
A = gendatd([50 50],10);% generate random 10D datasets
B = gendatd([50 50],10);
W = klm([],0.9); % untrained mapping, Karhunen-Loeve projection
V = A*W; % trained mapping V
D = B*V; % the result of the projection of B onto V
 
-fixed, D = A*W

This maps the dataset A by the fixed mapping W, resulting into a transformed dataset D. Examples are scaling and normalization, for example W = MAPPING(’SIGM’,’fixed’,S) defines a fixed mapping by the sigmoid function SIGM and a scaling parameter S. A*W is executed by PRTools as SIGM(A,S).

Example: normalize the distances of all objects in A such that their
city block distances to the origin are one.
A = gendatb([50 50]);
W = normm;
D = A*W;
 
-combiner, U = V*W

This combines two mappings. The mapping W is able to combine itself with V and produces a single mapping U. A combiner is defined by W = MAPPING(MAPPING_FILE,’combiner’,{PAR1,PAR2}) in which MAPPING_FILE is the name of the routine that executes the combining and PAR1 and PAR2 are the parameters that have to be included into the call to the MAPPING_FILE. Consequently, V*W is executed by PRTools as MAPPING_FILE(V,PAR1,PAR2). In a call as D = A*V*W, first B = A*V is resolved and may result in a dataset B. Consequently, W should be able to handle datasets and MAPPING_FILE is now called by MAPPING_FILE(B,PAR1,PAR2). Remark: the combiner construction is not necessary, since PRTools stores U = V*W as a SEQUENTIAL mapping (see below) if W is not a combiner. The construction of combiners, however, may increase the transparency for the user and efficiency in computations.

Example:
A = gendatd([50 50],10);% generate random 10D datasets
B = gendatd([50 50],10);
V = klm([],0.9); % untrained Karhunen-Loeve (KL) projection
W = ldc; % untrained linear classifier LDC
U = V*W; % untrained combiner
T = A*U; % trained combiner
D = B*T; % apply the combiner (first KL projection,
         % then LDC) to B

Differences between the four types of mappings are now summarized for a dataset A and a mapping W:

A*W -untrained: results in a mapping
    -trained: results in a dataset, size checking
    -fixed: results in a dataset, no size checking
    -combiner: treated as fixed

Suppose V is a fixed mapping, then for the various possibilities of the mapping W, the following holds:

A*(V*W) -untrained: evaluated as V*(A*V*W), resulting in a mapping
        -trained: evaluated as A*V*W, resulting in a dataset
        -fixed: evaluated as A*V*W, resulting in a dataset
        -combiner: evaluated as A*V*W, resulting in a dataset

2.4.6 How to Write Your Own Mapping

Users can add new mappings or classifiers by a single routine that should support the following type of calls:

  • W = mymapm([],par1,par2,…); defines the untrained, empty mapping.
  • W = mymapm(A,par1,par2,…); defines the map based on the training dataset A.
  • B = mymapm(A,W); defines the mapping of dataset A on W, resulting in a dataset B.

Below the subspace classifier subsc is listed. This classifier approximates each class by a linear subspace and assigns new objects to the class of the closest subspace found in the training set. The dimensionalities of the subspaces can be directly set by W = subsc(A,N), in which the integer N determines the dimensionality of all class subspaces, or by W = subsc(A,alf), in which alf is the desired fraction of the retained variance, for example alf = 0.95. In both cases the class subspaces V are determined by a principle component analysis of the single class datasets.

The three possible types of calls listed above are handled in the three main parts of the routine. If no input parameters are given (nargin < 1) or no input dataset is found (A is empty) an untrained classifier is returned. This is useful for calls like W = subsc([],N), defining an untrained classifier that can be used in routines like cleval(A,W,…) that operate on arbitrary untrained classifiers, but also to facilitate training by constructions as W = A*subsc or W = A*subsc([],N).

The training section of the routine is accessed if A is not empty and N is either not supplied or set by the user as a double (i.e. the subspace dimensionality or the fraction of the retained variance). PRTools takes care that calls like W = A*subsc([],N) are executed as W = subsc(A,N). The first parameter in the mapping definitions W = mapping(mfilename,… is substituted by Matlab® as ’subsc’ (mfilename is a function that returns the name of the calling file). This string is stored by PRTools in the mapping_file field of the mapping W and used to call subsc whenever it has to be applied to a dataset.

The trained mapping W can be applied to a test dataset B by D = B*W or by D = map(B,W). Such a call is converted by PRTools to D = subsc(B,W). Consequently, the second parameter of subsc(), N, is now substituted by the mapping W. This is executed in the final part of the routine. Here, the data stored in the data field of W during training is retrieved (class mean, rotation matrix and mean square distances of the training objects) and used to find normalized distances of the test objects to the various subspaces. Finally, they are converted to a density, assuming a normal distribution of distances. These values are returned in a dataset using the setdata routine. This dataset is thereby similar to the input dataset: it contains the same object labels, object identifiers, etc. Just the data are changed and the columns now refer now to classes instead of to features.

%SUBSC Subspace Classifier
%
% W = SUBSC(A,N)
% W = SUBSC(A,FRAC)
%
% INPUT
% A Dataset
% N or FRAC Desired model dimensionality or fraction of retained
% variance per class
%
% OUTPUT
% W Subspace classifier
%
% DESCRIPTION
% Each class in the training set A is described by linear subspace 
% of dimensionality N, or such that at least a fraction FRAC of 
% its variance is retained. This is realised by calling PCA(AI,N) 
% or PCA(AI,FRAC) for each subset AI of A (objects of class I).
% For each class a model is built that assumes that the distances 
% of the objects to the class
% subspaces follow a one-dimensional distribution.
% New objects are assigned to the class of the nearest subspace.
% Classification by D = B*W, in which W is a trained subspace 
% classifier and B is a testset, returns a dataset D with one- 
% dimensional densities for each of the classes in its columns.
% If N (ALF) is NaN it is optimised by REGOPTC.
%
% REFERENCE
% E. Oja, The Subspace Methods of Pattern Recognition, Wiley,  
% 1984.
%
% See DATASETS, MAPPINGS, PCA, FISHERC, FISHERM, GAUSSM, REGOPTC
% Copyright: R.P.W. Duin, <email>[email protected]</email>
% Faculty EWI, Delft University of Technology
% P.O. Box 5031, 2600 GA Delft, The Netherlands
 
function W = subsc(A,N)
 
name = ‘Subspace classf.’;
 
% handle default
if nargin < 2, N = 1; end
 
% handle untrained calls like subsc([],3);
if nargin < 1 | isempty(A)
    W = mapping(mfilename,{N});
    W = setname(W,name);
return
end
 
if isa(N,'double')&isnan(N) %optimize regularization parameter
    defs = {1};
    parmin_max = [1,size(A,2)];
    W=regoptc(A,mfilename,{N},defs,[1],parmin_max,testc([],'soft') ,0);
elseif isa(N,'double')
 
% handle training like A*subsc, A*subsc([],3), subsc(A)
% PRTools takes care that they are all converted to subsc(A,N)
 
  islabtype(A,'crisp'); % allow crisp labels only
  isvaldfile(A,1,2); % at least one object per class, two objects
                     % allow for datafiles
A = testdatasize(A,'features'); % test whether they fit
A = setprior(A,getprior(A)); % avoid many warnings
[m,k,c] = getsize(A); % size of the training set
for j = 1:c   % run over all classes
     B = seldat(A,j); % get the objects of a single class only
     u = mean(B); % compute its mean
     B = B - repmat(u,size(B,1),1); % subtract mean
     v = pca(B,N); % compute PCA for this class
     v = v*v'; % trick: affine mappings in original space
     B = B - B*v; % differences of objects and their mappings
     s = mean(sum(B.*B,2)); % mean square error w.r.t. the subspace
     data(j).u = u; % store mean
     data(j).w = v; % store mapping
     data(j).s = s; % store mean square distance
     end
                              % define trained mapping,
                              % store class labels and size
W = mapping(mfilename,'trained',data,getlablist(A),k,c);
W = setname(W,name);
elseif isa(N,'mapping')
% handle evaluation of a trained subspace classifier W for a 
% dataset A.
% The command D = A*W is by PRTools translated into D = subsc(A,W)
% Such a call is detected by the fact that N appears to be a
% mapping.
W = N;           % avoid confusion: call the mapping W
m = size(A,1);   % number of test objects
[k,c] = size(W); % mappingsize: from K features to C classes
d = zeros(m,c);  % output: C class densities for M objects
 
for j=1:c % run over all classes
     u = W.data(j).u; % class mean in training set
     v = W.data(j).w; % mapping to subspace in original space
     s = W.data(j).s; % mean square distance
     B = A - repmat(u,m,1); % substract mean from test set
     B = B - B*v; % differences objects and their mappings
     d(:,j) = sum(B.*B,2)/s; % convert to distance and normalise
end
d = exp(-d/2)/sqrt(2*pi); % convert to normal density
A = dataset(A); % make sure A is a dataset
d = setdata(A,d,getlabels(W)); % take data from D and use
                          % class labels as given in W other
                          % information in A is preserved
    W = d;                % return result in output variable W
else
    error('Illegal call') % this should not happen
end
return

2.5 Selected Bibliography

  1. http://www.37steps.com/
  2. http://rduin.nl/manual/
  3. http://prtools.org/prtools/
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset