Scene text detection

The scene text detection algorithm builds a component tree of an image by thresholding it step-by-step from 0 to 255. To enhance the results, this process is done for each color channel, intensity, and gradient magnitude images. After that, the connected components obtained from successive levels are hierarchically organized depending on their inclusion relationship as shown in the following diagram. This tree organization may contain a huge number of regions:

Scene text detection

Tree organization example

Thus, the algorithm selects some regions following two stages. Firstly, area, perimeter, bounding box, and Euler number descriptors are computed for each region and used in order to estimate the class-condition probability. External regions with local maximum probabilities are selected if their values are above a global limit and the difference between their local maximum and minimum is also above a specified limit.

The second stage consists of classifying the external regions selected in the first stage into character and non-character classes using whole area ratio, convex hull ratio, and the number of outer boundary inflexion points as features.

Finally, the external regions selected are grouped to obtain words, lines, or paragraphs. This part of the algorithm uses a perceptual-organization-based clustering analysis.

The following textDetection example illustrates how to use the Scene Text Detection algorithm and localize text in an image:

#include "opencv2/opencv.hpp"
#include "opencv2/objdetect.hpp"
#include "opencv2/highgui.hpp"
#include "opencv2/imgproc.hpp"

#include <vector>
#include <iostream>
#include <iomanip>

using namespace std;
using namespace cv;

int main(int argc, const char * argv[]){

    Mat src = imread(argv[1]);

    vector<Mat> channels;
    computeNMChannels(src, channels);

    //Negative images from RGB channels
    channels.push_back(255-channels[0]);
    channels.push_back(255-channels[1]); 
    channels.push_back(255-channels[2]);
    channels.push_back(255-channels[3]);
    for (int c = 0; c < channels.size(); c++){
        stringstream ss;
        ss << "Channel: " << c;
        imshow(ss.str(),channels.at(c));
    }

    Ptr<ERFilter> er_filter1 = createERFilterNM1(
                                   loadClassifierNM1(argv[2]),
                                   16, 0.00015f, 0.13f, 0.2f,
true, 0.1f );
    Ptr<ERFilter> er_filter2 = createERFilterNM2(
                                   loadClassifierNM2(argv[3]),  0.5 );

    vector<vector<ERStat> > regions(channels.size());
    // Apply filters to each channel
    for (int c=0; c<(int)channels.size(); c++){
        er_filter1->run(channels[c], regions[c]);
        er_filter2->run(channels[c], regions[c]);
    }
    for (int c=0; c<(int)channels.size(); c++){
        Mat dst = Mat::zeros( channels[0].rows + 
                              2, channels[0].cols + 2, CV_8UC1 );
        // Show ERs
        for (int r=0; r<(int)regions[c].size(); r++)
        {
            ERStat er = regions[c][r];
            if (er.parent != NULL){
                int newMaskVal = 255;
                int flags = 4 + (newMaskVal << 8) + 
                                 FLOODFILL_FIXED_RANGE + 
                                 FLOODFILL_MASK_ONLY;
                floodFill( channels[c], dst, Point(er.pixel % 
                           channels[c].cols,er.pixel / 
                           channels[c].cols), Scalar(255), 0, 
                           Scalar(er.level), Scalar(0), flags);
            }
        }
        stringstream ss;
        ss << "Regions/Channel: " << c;
        imshow(ss.str(), dst);
    }

    vector<Rect> groups;
    erGrouping( channels, regions, argv[4], 0.5, groups );
    for (int i=(int)groups.size()-1; i>=0; i--)
    {
        if (src.type() == CV_8UC3)
            rectangle( src,groups.at(i).tl(), groups.at(i).br(), 
                       Scalar( 0, 255, 255 ), 3, 8 );
        else
            rectangle( src,groups.at(i).tl(), groups.at(i).br(), 
                       Scalar( 255 ), 3, 8 );
    }
    imshow("grouping",src);

    waitKey(-1);
    er_filter1.release();
    er_filter2.release();
    regions.clear();
    groups.clear();
}

The code explanation is as follows:

  • void computeNMChannels(InputArray _src, OutputArrayOfArrays _channels, int _mode=ERFILTER_NM_RGBLGrad): This function computes different channels from the image in _src to be processed independently in order to obtain high localization recall. These channels are red (R), green (G), blue (B), lightness (L), and gradient magnitude (∇) by default (_mode=ERFILTER_NM_RGBLGrad), it is intensity (I), hue (H), saturation (S), and gradient magnitude (∇) if _mode=ERFILTER_NM_IHSGrad. Finally, the computed channels are saved in the _channels parameter.
  • Ptr<ERFilter> createERFilterNM1(const Ptr<ERFilter::Callback>& cb, int thresholdDelta = 1, float minArea = 0.00025, float maxArea = 0.13, float minProbability = 0.4, bool nonMaxSuppression = true, float minProbabilityDiff = 0.1): This function creates an Extremal Region Filter for the classifier of the first stage defined by the algorithm. The first parameter loads the classifier by means of the function loadClassifierNM1(const std::string& filename). The thresholdDelta variable indicates the threshold step during the component tree obtaining process. The parameters minArea and maxArea establish the percentages of the image size between which external regions are retrieved. The value of the bool parameter nonMaxSuppression is true when non-maximum suppression is applied over the branch probabilities, and false otherwise. Finally, the minProbability and minProbabilityDiff parameters control the minimum probability value and the minimum probability difference between local maxima and minima values allowed for retrieving an external region.
  • Ptr<ERFilter> createERFilterNM2(const Ptr<ERFilter::Callback>& cb, float minProbability = 0.3): This function creates an External Region Filter for the classifier of the second stage defined by the algorithm. The first parameter loads the classifier by means of the function loadClassifierNM2(const std::string& filename). The other parameter, minProbability, is the minimum probability allowed for retrieved external regions.
  • void ERFilter::run( InputArray image, std::vector<ERStat>& regions): This method applies the cascade classifier loaded by the filter to obtain the external regions either in the first or the second level. The image parameter is the channel that has to be examined and regions is a vector with the output of the first stage and also the input/output of the second one.
  • void erGrouping(InputArrayOfArrays src, std::vector<std::vector<ERStat>>& regions, const std::string& filename, float minProbability, std::vector<Rect>& groups): This function groups the external regions obtained. It uses the extracted channels (src), the obtained external regions by each channel (regions), the path to the grouping classifier, and the minimum probability for accepting a group (minProbability). Final groups, which are rectangles from Rect, are stored in the vector groups.

The following group of screenshots shows the obtained image channels. These are red (R), green (G), blue (B), intensity (I), gradient magnitude (∇), inverted red (iR), inverted green (iG), inverted blue (iB), and inverted intensity (iI). In the first row, the R, G, and B channels are shown. The second row shows the I, ∇, and iR channels. Finally, in the third row, the iG, iB, and iI channels are shown:

Scene text detection

Extracted image channels

The following group of screenshots shows it is possible to see the external regions extracted from each channel. Channels R, G, B, L, and ∇ produce more accurate results. In the first row, external regions from the R, G, and B channels are shown. The second row shows the external regions extracted from the I, ∇, and iR channels. Finally, in the third row, the iG, iB, and iI channels are shown:

Scene text detection

External regions obtained from each channel

Finally, the following screenshot shows the input image with the text areas grouped into lines and paragraphs:

Scene text detection

Groups obtained

Note

To reproduce these results or use the OpenCV Scene Text Detector, it is possible to use this code with the sample files provided by the library. The input image and classifier can be found in the OPENCV_SCR/samples/cpp directory. The image used here is cenetext01.jpg. The first and second level classifiers are trained_classifierNM1.xml and trained_classifierNM2.xml. Finally, the grouping classifier provided by OpenCV is trained_classifier_erGrouping.xml.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset