Deep learning for multi-label classification

In the paper on Large-scale Multi-label Text Classification – Revisiting Neural Networks (https://arxiv.org/abs/1312.5419), Nam et al. approaches this problem by using a deep multi-layer perceptron (MLP) with a hidden layer and output units producing scores for the labels. They use a label predictor which converts the label scores from the deep network to binary classes using thresholding based on a rank loss function. The details of this approach can be found in the aforementioned paper.

The following diagram illustrates this approach:

Approach for multi-label classification

In the preceding diagram, the output consists of nine possible labels on which thresholding is applied. The threshold which yields the highest F1 score, which is the middle one is picked. The blue bars are the relevant labels for a training example.

While there are many state-of-the-art deep learning text classification approaches for multi-class problems, they can also output top-k class predictions (or we can apply thresholding) for multi-labels. We will look at some of these multi-class models that can be used for multi-label classification. One recent popular deep learning method is fastText, as explored in the paper Bag of Tricks for Efficient Text Classification (https://arxiv.org/abs/1607.01759) by A Joulin et al. In fastText, a document is represented by averaging the word embedding vectors of the words appearing in the document. This averaged document vector is then passed to a softmax layer that outputs the class probabilities. This approach, therefore, does not take into consideration the word order. In the paper Convolutional Neural Networks for Sentence Classification (http://www.aclweb.org/anthology/D14-1181) by Kim et al., a CNN is used on the concatenation of word embeddings of a document. Several filters are used on the document whose output is fed to a max overtime pooling layer. This is followed by a fully connected layer with softmax outputs corresponding to L labels. This was the approach we used in one of our examples for text classification using a CNN. In the paper Deep Learning for Extreme Multi-label Text Classification (https://dl.acm.org/citation.cfm?id=3080834), Liu et al. utilizes the XML-CNN architecture for multi-label classification. In particular, the previous approaches may not work well with a large number of labels and/or a skewed distribution of labels. The XML-CNN architecture tries to address this problem.

The following diagram illustrates this architecture:

In the preceding diagram, the model uses a CNN with dynamic max pooling and a fully connected sigmoid output for the large label space. Another recent state-of-the-art method in text classification is attention-based networks, which have shown promising improvements compared to all the methods we have described so far. We will look at one of the recent papers on this in the following topic.

Table of Contents for Deep learning for multi-label classification

Create new playlist

Sign In

Sign Up

Table of Contents for
Deep learning for multi-label classification