0
$\begingroup$

Background

Given a noisy dataset $D$, I have to solve a classification problem where the possible anserwer is $i\in\{1,\dots,N\}$. So far I can get pretty decent result with an algorithm that, based on the observed dataset $D$, associates to each category $i$ a cost $c(i,D)\in[0,\infty)$. Then, the estimate is simply given by \begin{equation*} \hat{i} \triangleq \arg\min_{i\in{1,\dots,N}} c(i, D) \end{equation*} Since the method is merely based on the oberved dataset $D$, I believe that I can improve the estimation robustness by introducing some kind of prior information. I'm not sure how to do achieve this objective, but I have an idea, which is the following:

  • step 1: turn the list of costs $\{c(1,D)\}_{i=1}^N$ into a proability distribution $\{\mathcal{L}(i, D)\}_{i=1}^N$ - which I'm tempted to call it likelihood. The first thing that comes into my mind to define the "likelihood" is to consider the following softmax-like transformation of the list of costs \begin{equation*} \mathcal{L}(i,D) \triangleq \frac{\exp\left(-c(i,D)\right)}{\sum_{j=1}^N \exp\left(-c(j,D)\right)} \end{equation*}

  • step 2: according to some rule, fuse the likelihood $\{\mathcal{L}(i, D)\}_{i=1}^N$ with a second (and given) probability distribution $\{p_0(i)\}_{i=1}^N$ encoding the prior information. The result is a new probability distribution $\{p(i, D)\}_{i=1}^N$. The first thing that comes into my mind to define the "posterior" is to apply the Bayes theorem \begin{equation*} p(i,D) \triangleq \frac{\mathcal{L}(i,D)\,p_0(i)}{\sum_{j=1}^N \mathcal{L}(j,D)\,p_0(j)} \end{equation*}

  • step 3: according to some rule, extract from the "posterior" distribution $\{p(i, D)\}_{i=1}^N$ the estimate $\hat{i}$. The first thing that comes into my mind is to consider the mode \begin{equation*} \hat{i} \triangleq \arg \max_{i\in\{1,\dots,N\}} p(i,D) \end{equation*}

Questions

  1. Given the procedure above makes sense, is there a better way to define the "likelihood" and the "posterior"?
  2. What are the possible problems with my approach? I can provide more details about the estimation problem and how I compute the cost $c(i, D)$.
$\endgroup$

0