Questions tagged [information-theory]
A branch of mathematics/statistics used to determine the information carrying capacity of a channel, whether one that is used for communication or one that is defined in an abstract sense. Entropy is one of the measures by which information theorists can quantify the uncertainty involved in predicting a random variable.
674
questions
0
votes
0
answers
12
views
Mutual Information of nonadjacent nodes in Bayesian Network
How do you compute the mutual information of two non-adjacent nodes in a Bayesian network?
In this case, what would $I(D;A)$ be? Would I need to take the conditional probabilities of all intemediate ...
0
votes
0
answers
20
views
differential entropy for comparison distributions
I want to use differential entropy to compare the outcome of Bayesian updating (multidimensional probability distributions) for different datasets. My parameters are different physical parameters i.e. ...
2
votes
3
answers
104
views
Is feature-extraction and dimensionality-reduction a kind of compression?
I'm struggling to understand what these terms have in common:
Feature extraction
Feature selection
Compression
Dimensionality reduction
Relatedly, the information / entropy in our data should always ...
1
vote
0
answers
27
views
Fourier transform in information transfer in biological neural network
Principles of Neural Design by Peter Sterling and Simon Laughlin describes a usage of information theory in calculating the rate of information transfer in the brain.
...when successive signal states ...
0
votes
0
answers
4
views
Sample Complexity of BHT with varying degrees of (large) compression
In Communication-constrained hypothesis testing:
Optimality, robustness, and reverse data processing inequalities, the following (up to some mild editing to highlight my question) is established.
...
0
votes
0
answers
18
views
Interpretation of time series spectral entropy values wrt forecastability by a general neural network
I recently started using spectral entropy to analyze time series (already windowed). I'm having difficulty for interpreting the results, the entropy of the last 25% of a series is 0.19, and the ...
2
votes
0
answers
41
views
Shannon source coding theorem and differential entropy
Loosely speaking, Shannon's source encoding theorem says that there is an encoder with rate at least $H(x)$ such that $n$ repetitions of the source can be mapped to at least $nH(X)$ bits of binary ...
1
vote
1
answer
59
views
Chain rule conditional entropy
A textbook I am reading states that$$H(X,Y)=H(X)+H(Y|X)$$where $H(X,Y)$ is the joint entropy of random variables $X,Y$, $H(X)$ the entropy of $X$, and $H(Y|X)$ is conditional entropy. It then states ...
0
votes
0
answers
21
views
How do you choose to put a distribution on the right or left of KL divergence? [duplicate]
I always thought of KL divergence as a distance metric between distributions, much like Earth-Movers distance. But I can no longer ignore the asymmetry. A real distance metric is symmetric.
How should ...
1
vote
0
answers
28
views
Mutual Information decay
Consider $m$ channels indexed by $i$ with $1 \leq i \leq m$. The input alphabets are from the same finite set $\mathcal{X}$. Let $\pi$ denote a probability distribution on $\mathcal{X}$. Define the ...
3
votes
0
answers
283
views
Minimum Description Length, Normalized Maximum Likelihood, and Maximum A Posteriori Estimation
TL;DR: I believe MDL using NML is a special case of the joint MAP of model and parameters, and need to verify this and find sources that have acknowledges this.
This is how I understand Minimum ...
0
votes
1
answer
43
views
Is there any work linking information channel theory to statistical inference?
I wonder what is the theoretical limit of a statistical inference problem. For example we have a model with many parameters, and we can sample many data points from the model. This can be viewed as a ...
1
vote
0
answers
28
views
Choosing number of lag AND model form for Augmented Dickey-Fuller test
Before realising an Augmented Dickey-Fuller (ADF) test, one has to answer 2 questions, how many lags p to include in the model, AND which model to choose among the following:
No constant, no trend
...
1
vote
1
answer
44
views
Minimum entropy decomposition of probability distributions
Say you want to decompose a probability distribution (a PDF) into a mixture of distributions in such a way as to minimize the mean entropy of the component distributions. I have an idea that this is ...
1
vote
0
answers
16
views
Effect on entropy when we scale Bernoulli plus Gaussian
Question: Given $X\sim\text{Bernoulli}(\alpha)$, $Y\sim\mathcal{N}(0,1)$, and non-random positive constants $C,\epsilon>0$. Let $H(\cdot)$ be the differential entropy. Is it true that
$$
H((C+\...