3
$\begingroup$

The structure of the question is as follows: at first, I provide the concept of collective recognition, further I provide explanation of the various methods of group classification that I found, in the end I introduce you the question. Those who are experts in this field and may not need explanations might just look at the headlines go straight to the question.


What is a collective recognition/classification

What is meant by the term collective recognition is the task of using multiple classifiers (committee, ensemble, etc.), each of which will decide on the class of one entity with the subsequent coordination of their decisions with the help of a certain algorithm. Using a set of classifiers, typically lead to higher recognition accuracy and better computational efficiency indicators.

Some approaches of multiple classifiers decisions integration:

  1. based on the concept of classifiers’ competence areas and procedures that assess the competence of classifiers with respect to each input of the classification system.
  2. methods for combining classifiers decisions based on the use of neural networks.

Competence areas method

The idea of collective classification based on the competence areas is that each base classifier can work well in some feature space area (area of competence), excelling in this area remaining classifiers in terms of accuracy and reliability of decisions. The area of competence of each base classifier must somehow be estimated. Appropriate program called referee. Classification task is solved so that each algorithm is only used in its own competence area, i.e. where it produces the best results compared to other classifiers. At the same time in each area the decision of only one classifier is taken into account. However, you must have certain algorithm that for any input determines which of the classifiers is the most competent.

So, one approach suggests that with each classifier a special algorithm (the referee, which is designed to assess the competence of the classifier) is used. What is meant by the term “competence of the classifier” in a given region of space of classification objects representation is the accuracy, i.e. probability of correct classification of objects whose description belongs to that area.

The general collective recognition training scheme that is based on competence assessment consists of 2 steps (Fig. 1). At the 1-st step training and testing of each individual base classifier is implemented. This step does not differ from conventional training schemes. In the next step, after each classifier testing, the training sample that was used in the testing phase of a certain classifier, is divided into two subsets, L+ and L-. The 1-st subset includes those instances of the initial test sample that were classified correctly during testing. The 2-nd subset includes other instances of the test sample, i.e. those that were classified incorrectly. Considering these two data sets as the areas of competence and incompetence of the classifier, it can be used as training data for "referee" algorithm training process. During new data classification the task of referee is to determine (for each input instance) whether it belongs to its area of competence or not, and if it belongs, estimate the probability of correct classification of this particular input instance. After that the referee instructs the most competent classifier to solve the classification task. enter image description here


Neural network approaches.

Neural network approaches of collective classification divided into methods: that use neural network to combine multiple classifiers, ensembles of neural networks and those that use neural networks constructed from modules.

A neural network for combining classifiers
One approach uses a neural network to combine basic classifiers multiple decisions (Fig. 2). The output of each base classifier is a decisions vector (a vector containing 'soft tags' values), the values of which belong to a certain numerical interval [a, b]. These values are delivered to the neural network input (network must be trained to combine base classifiers decisions). The output of the neural network is a final decision in favor of one or another class. The output of the network can also be a vector whose dimension is equal to the number of classes of objects to be recognized that at each position has some confidence degree value in favor of one or another class. In this case, the class with a maximum value of such confidence value can be selected as a decision.

enter image description here

Decisions integration functionates as follows:

  1. a number of base classifiers is selected and trained;
  2. metadata for the neural network training is prepared. For that, base classifiers are tested using the interpreted data sample. Then for each test sample a vector of base classifiers decisions is generated. Then, a component which includes the true class value of the test sample is added to the decisions vector;
  3. meta-data sample is used for training the neural network that performs decisions integration.

Modular neural networks method

For modular neural networks it is proposed to use the so-called gating network (neural network for assessing the classifiers’ competence for a particular input vector of data provided for classifiers). This option represents the neural network paradigm to integrate decisions based on competences. The corresponding theory is called mixture of experts. Each classifier is associated with the so-called "referee" program, which predicts its competence degree with respect to a specific input, that a set of base classifiers is provided with (Fig. 3). enter image description here

Depending on the input vector X, decisions of different classifiers can be somehow selected and used in order to obtain a combined decision. Number of inputs of predicting network equal to the dimension of the input vector of feature space. Number of network outputs equal to the number of classifiers L. Given a specific input vector, predicting neural network is trained to predict the competence degree of each classifier, i.e. evaluation of the fact that the classifier provides the correct decision. The degree of competence is represented by the number on the interval [0,1].


Ensembles of Neural Networks

Also, decisions integration system architecture, which consists of several “experts” (neural networks) is proposed. Combining the knowledge of neural networks in the ensemble proved to be effective, demonstrating the prospects of application of the collective recognition technology to overcome the problem of "fragility".

The ensemble of neural networks - a set of neural network models, that makes the decision by averaging the results of the individual models. Depending on how the ensemble is constructed, its use allows to solve one of two problems: the tendency of base neural network architecture to underfit (this problem is solved by boosting meta-algorithm), and tendency of the base architecture to overfit (bagging meta-algorithm).

There are various universal voting schemes, for which the winner is the class:

  1. maximum - with a maximum response of the ensemble members;
  2. average - with the highest average response of the ensemble members;
  3. majority - with the largest number of votes of the members of the ensemble.

My question

Which scheme of collective recognition is most suitable for symbol/digit recognition?

Mostly, I will work with digits. Data sources from which I took the information about the various schemes of collective classification dated 2006, and I am afraid that some methods may become obsolete.

I have mentioned the following methods:

  • on the basis of competence areas
  • on the basis of using a neural network to combine classifiers
  • on the basis of modular neural networks
  • on the basis of neural network ensembles.

Which method, potentially, can show better classification accuracy and performance in the task of symbols/digits recognition?

Or, are there new and more efficient methods of collective recognition/classification that I did not mention here?

$\endgroup$
5
  • 2
    $\begingroup$ You may want to use some actual question marks. $\endgroup$
    – Raphael
    Commented Aug 29, 2016 at 1:59
  • $\begingroup$ This seems at the same time quite broad ("any information you could give me") but also potentially subjective. What do you mean by "more suitable/rational/reasonable"? Community votes, please: too broad? Subjective? $\endgroup$
    – Raphael
    Commented Aug 29, 2016 at 2:00
  • $\begingroup$ Cross-posted on Stack Overflow, CS.SE, and AI.SE. Please do not post the same question on multiple sites. Each community should have an honest shot at answering without anybody's time being wasted. $\endgroup$
    – D.W.
    Commented Aug 29, 2016 at 2:24
  • 3
    $\begingroup$ @ErbaAitbayev'KZ', I see that perhaps I wasn't sufficiently clear. You are not allowed to post your question on multiple Stack Exchange sites. It doesn't matter if you felt your question wouldn't receive enough attention or if one of the sites is in Beta; it's still not allowed. Nowhere in the question you linked to does it say it's OK to post on multiple SE sites. You might want to refer to the link I provided for our policy. $\endgroup$
    – D.W.
    Commented Aug 29, 2016 at 2:31
  • $\begingroup$ The competence split in case of digits sounds like fancy overthinked SVM, which is not the best choice. Anyway, how would you split digits into competence areas? It sounds like theoretical overkill. $\endgroup$
    – Evil
    Commented Aug 29, 2016 at 3:54

1 Answer 1

4
$\begingroup$

The state of the art for digit recognition does not use collective recognition, competence areas, ensembles, or any of the other ideas you propose in your question.

Instead, the state of the art for digit recognition uses convolutional neural networks. Just a convolutional neural network: no need for multiple of them, no need for any kind of other fancy business on top of it. Instead, the state of the art focuses on the specific architecture of the convolutional neural network (e.g., how many layers? what types of pooling?) and on the training procedures (e.g., adjusting the learning rate in stochastic gradient descent, dropout, batch normalization, and more).

As far as I know, the same is true for the state of the art for object/symbol recognition.

So, my advice is: while all those concepts sound nice on paper, they don't seem to actually be needed or useful for the tasks you care about. You're better off ignoring those concepts. Sorry to be the bearer of bad news when you've obviously put a bunch of effort into typing up an explanation of those concepts.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.