Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications

Mini-Project: Markov Random Fields (MRFs) and
Applications
Anmol Dwivedi
1 Introduction
Often times a Bayesian Network (BN) I-map would consist of extra edges and is not able to capture
the desired independence properties as required. Furthermore, a BN would require specifications
of directions assigned between random variables in the network. Undirected graphical models, in
contrast, model a variety of phenomena where one cannot ascribe directionality to the interaction
between variables. Such Markov Networks have found extensive applications in image processing,
computer vision tasks etc. Similar to BNs, the Markov network defines the local interactions be-
tween directly related variables and the global model is constructed by combining the local models.
To ensure the we obtain a valid PDF, a normalization constant also known as the partition function
is also introduced into the model. The benefit of this representation is that it allows us flexibility
in representing interactions between random variables, however, the effects of such changes are not
always intuitively understandable. Consider an undirected graph G = (V, E) where the nodes of the
graph denote random variables represented by Xi. The Markov condition for undirected graphical
model is given by
P(Xi|X−i) = P(Xi|XNi ) , (1)
where X−i denotes all the other random variables except −i and XNi denotes the neighbors of
node i. According to the Hammersley–Clifford theorem the joint probability distribution can be
factorized as follows
P(X) =
Q
c∈C ψ(Xc)
Z
, (2)
where Xc denotes the variables associated with the maximum cliques in the Markov network. One
convenient form for the potential function is the log-linear function, i.e.,
ψ(Xc) = e(−wc·E(Xc))
, (3)
where −wc are the parameters of the potential function and E(.) denotes the energy function capa-
ble of capturing interactions between random variables. While MRFs are employed for numerous
applications, image segmentation is one of the most common applications for MRFs. MRFs for the
image segmentation are modeled as follows:
• Pixels of a given image are treated as random variables of an undirected graph.
• The latent variables associated with the pixel values represent labels that are to be inferred.
• The joint distribution of the pixels and the latent variables is subsequently exploited for vari-
ous applications.
1

2 Problem Statement
The goal of this project is to implement a pairwise binary label-observation Markov random field
model for bi-level image segmentation. Specifically, two inference algorithms, i.e., the Iterative
Conditional Mode (ICM) and Gibbs sampling methods will be implemented to perform image seg-
mentation for the given image shown in Figure 1. In summary, the goal is to segment the background
Figure 1: Image to be segmented
and foreground of Figure 1. As for the notations, the random variables associated with the intensity
(observed variables) are denoted by Yi that can take any integer between 0 and 255 and the random
variables Xi denote the labels that denote the background and foreground labels that are to be in-
ferred. The Potts model is employed to model the correlations between the two latent nodes and the
intensity Yi given Xi are modeled as Gaussian random variables.
3 Theory
Before we discuss various algorithms required for inference in such graphical models, we define a
few terms. In particular, a Markov random field and the Potts model. MRFs is an undirected graph
that models the correlation between random variables that follows the Markov condition as stated in
(1). Potts model on the other hand is a specific type of model of a discrete pairwise Markov random
field in which the energy function renders the following form
Eij(Xi, Xj) = 1 − δ(Xi − Xj) , (4)
where δ is the delta function. The pairwise potential function between random variables Xi’s, that
are the labels, are modeled as
ψij(Xi, Xj) = e−Eij(Xi,Xj)
. (5)
Such a model encourages local smoothness or consistency among neighboring nodes. Put together,
the problem we study forms a label-observation model that is a special kind of MRF. The unary
potential function considered in this project is considered to be
φ(Xi, Yi) = e−Ei(Xi,Yi)
, (6)
where the Ei(Xi, Yj) is modeled as
Ei(Xi, Yi) = − log

1
σx ·
√
2 · π
· e−1
2
·(
yi−µx
σx
)2

. (7)
2

By the definition of the pairwise markov network, the joint density for all the pixels X = {xi} and
their observations Y = {yi} can be defined as
P(X, Y) =
e
−
P
Xi∈V αi·Ei(Xi,Yi)−
P
Xi∈V
P
Xj∈NXi
βij·Eij(Xi,Xj)
Z
. (8)
In this project, the neighbors of Xi are only restricted to four and the parameter for the MRF are
considered to be
µ0 = 106.2943, σ0 = 5.3705, µ255 = 141.6888, σ255 = 92.0635, αi = 0.3975, βij = 2.3472 (9)
The goal is given the pixel intensities Y in the image and the MRF model parameters W in
(9), find the X∗ that maximizes P(X|Y, W). Also termed as the MAP inference, we consider two
popular algorithm to perform inference in the network introduced below.
3.1 Iterative Conditional Mode (ICM) method
Iterated conditional modes (ICM) algorithm is an algorithm for obtaining a configuration of a local
maximum of the joint probability in a Markov random field. It is done by iteratively maximizing
the probability of each variable conditioned on the rest. It starts by initializing the nodes to some
starting state values and then starts cycling through the nodes in order. For every node i, we consider
all possible state that node i renders, and replace its current state with the state that maximizes the
joint potential. We keep cycling through the nodes in order until we complete a full cycle. Iteratively
carrying out the procedure is guaranteed to produce a local optimal value of the joint potential that
cannot be further improved upon by changing the state of any single node. Algorithm-1 shows the
summary of the algorithm. In its core, the task is to
P(X|Y ) ≈
Y
i
P(Xi|X−i, Yi) = P(Xi|XNi , Yi) (10)
X∗
i = arg max
Xi
P(Xi|XNi , Yi) (11)
= arg max
Xi
P(Xi, XNi , Yi) (12)
= arg min
Xi
α · E(Xi, Yi) +
X
Yj∈NXi
β · E(Xi, Xj) (13)
Algorithm 1 Iterative Conditional Mode (ICM) algorithm [?]
1: Inputs: MRF parameters in (9)
2: Initialization of the matrix X
3: while Convergence ∀i ∈ V do
4: Xi = arg maxXi
−αi · Ei(Xi, Yi) −
P
Xj∈NXi
βij · Eij(Xi, Xj)
5: end while
6: Return X
3

3.2 Gibbs Sampling Method
Gibbs Sampling is an MCMC algorithm that samples each random variable of a graphical model,
one at a time. Stationary distributions are of great importance in MCMC and to understand them,
we need to define a few terms for Markov chains:
• Irreducible: an MC is irreducible if you can get from any state x to any other state x0 with
probability 0 in a finite number of steps i.e. there are no unreachable parts of the state
space.
• Aperiodic: an MC is aperiodic if you can return to any state x at any time
• Periodic MCs have states that need ≥ 2 time steps to return to (cycles).
• Ergodic (or regular): an MC is ergodic if it is irreducible and aperiodic.
• Ergodicity is important: it implies you can reach the stationary distribution, no matter the
initial distribution.
• All good MCMC algorithms must satisfy ergodicity, so that you cannot initialize in a way
that will never converge.
This theory suggests how to sample from the stationary probability. Start from an initial state
where Xi ∀i ∈ V are initialized and then sample Xi+1 from P(Xi+1|X−(i+1)). Owing to the
Markov condition, this boils down to sampling from P(Xi+1|XNi+1 ) repeatedly. When this is
repeated large number of times, sample Xi comes from the true distribution P∗(X). To perform
MAP inference, sample mean of the samples after the burn in time can be computed as follows to
approximate the posterior.
X∗
= arg max
X
P(X|Y) ≈ X̄ (14)
where X̄ is the
P
i Xi/N where N , |V|.
Algorithm 2 Gibbs Sampling Algorithm [?]
1: Inputs: MRF and Evidence Variable Yi
2: Setup X = {X1, X2, X3, . . . , XN }
3: Randomly initialize all the samples X0 = {x1, x2, x3, . . . , xN }
4: Randomly choose node i and sample from P(Xi|NXi ))
5: While sampling ensure only change the state of one variable Xi at a time
6: {XN , XN+1, XN+2, . . . } after the burn-in period
7: Compute the X̄ return X̄
Keep going until after the burn-in period and then start collecting the samples until enough
samples are collected. There is no theoretical guidance as to how many samples we need burn
before we can assume later samples are from the true distribution.
4

4 Experiments
A MRF over the entire image lattice of dimensions 640x360 with 4-neighborhood nodes is consid-
ered to perform inference. The labels X are initialized in various ways, and the corresponding labels
output of the ICM algorithm are displayed. Below are the segmented image for various initialization
and also show the number of iteration it took to reach an accuracy of 0.1.
4.1 Iterative Conditional Mode (ICM)
Figure 2: ICM output with 100% zeros initialization
takes 28 iterations to converge
Figure 3: 100% zeros initialization
As observed the image is segmented successfully in all the cases and the segmentation results are
almost the same.
5

6

4.2 Gibbs Sampling Method
Again MRF over an entire image lattice of dimensions 640x360 with 4-neighborhood nodes is
considered to perform inference. The labels X are initialized, and the corresponding labels output
of the gibbs sampling algorithm are displayed. The image are different owing to the calculation of
sample mean after different burning times.
Figure 10: Gibbs sampling output after the first 20 iterations of burn-in period
As expected, after long number of iterations, the seqmented image quality becomes better and better.
7

8

5 Conclusion
This project performs MAP inference using two algorithms in MATLAB for application in image
segmentation. The summary of tasks performed:
• The ICM and Gibbs sampling algorithm are validated on the image to perform image seg-
mentation task. The background and foreground are segmented given the image.
• Various initialization and burn in periods for the two respective algorithms have also been
considered.
• Only the 4-neighbors are incorporated in the model while other methods to define neighbors
may exist and may attain better performance.
• Overall the Gibbs sampling method produces better quality segmentation since after allowing
the chain to run for many iterations, the chain mixes making it superior the ICM approxima-
tion counterpart.
6 Appendix
In order to reproduce results, run the following .m files
• ”a main ICM.m”
• ”a main Gibbs sampling.m”
All the other files are helper functions are called from the these files respectively.
Listing 1: a main ICM.m Code for ICM approach
1 clc;
2 clear all;
3 close all;
4
5 [Image1] = imread(’Proj4_image.png’,’png’);
6 Image = double(Image1’);
7
8 Parameters.mu0 = 106.2943; Parameters.sigma0 = 5.3705;
10 Parameters.alpha = 0.3975; Parameters.beta = 2.3472;
11
12 X = zeros(size(Image));
13 imshow(X’)
14
15
16 vectorize = X(:);
17 random_integers = randperm(length(vectorize));
18 vectorize(random_integers(1:length(vectorize)/5)) = 255;
19 X = reshape(vectorize, [640,360]);
20
21 imshow(X’)
22
23 [x_grid, y_grid] = size(X);
9

24 X_possible_values = [0, 255];
25 %% % X_{i} is the label of the ith pixel in {0, 255} quantifies the ←-
UNDERLYING LABEL
26 % % Y_{i} is the observation in [0, 255] quanitifes the INTENSITY of pixel
27 iter_num = 50;
28 error = zeros(1, iter_num);
29 for iter = 1:iter_num
30
31 for i = 1:x_grid
32 for j = 1:y_grid
33
34 % Two possible values for X
35 E_min = joint_potential(X, i, j, 0, Image, Parameters);
36 E_max = joint_potential(X, i, j, 255, Image, Parameters);
37
38 % Find minimum energy or max posterior
39 [˜, arg_min_index] = min([E_min, E_max]);
40
41 error(iter) = error(iter) + abs(X(i,j) - X_possible_values(arg_min_index));
42
43 X(i,j) = X_possible_values(arg_min_index);
44 end
45 end
46
47 fprintf(’Error is %f after %d iterations n’, error(iter), iter);
48 if error(iter) 1e-1
49 break
50 end
51 end
52
53 figure;
54 imshow(X’)
Listing 2: a main Gibbs sampling.m Code for Gibbs Sampling approach
1 clc;
2 clear all;
3 close all;
4
5 % Load Data
6 [Image1] = imread(’Proj4_image.png’,’png’);
7 Image = double(Image1’);
8
9 % Parameters
10 iter_num = 5;
11
14 Parameters.alpha = 0.3975; Parameters.beta = 2.3472;
15
16 X = zeros(size(Image));
17 [x_grid, y_grid] = size(X);
18 X_possible_values = [0, 255];
19 %% % X_{i} is the label of the ith pixel in {0, 255} quantifies the ←-
UNDERLYING LABEL
10

20 % % Y_{i} is the observation in [0, 255] quanitifes the INTENSITY of pixel
21 iter_num = 50;
22 error = zeros(1, iter_num);
23 for iter = 1:iter_num
24
25 for i = 1:x_grid
26 for j = 1:y_grid
27
28 % Two possible values for X
29 E_min = joint_potential(X, i, j, 0, Image, Parameters);
30 E_max = joint_potential(X, i, j, 255, Image, Parameters);
31 values = normalization(exp(-E_min), exp(-E_max));
32 sample = binornd(1,values(1));
33
34 if sample == 0
35 temp = X(i,j);
36 X(i,j) = X_possible_values(2);
37 error(iter) = error(iter) + abs(temp - X(i,j));
38 else
39 temp = X(i,j);
40 X(i,j) = X_possible_values(1);
41 error(iter) = error(iter) + abs(temp - X(i,j));
42 end
43
44 end
45 end
46
47 fprintf(’The error is %f after %d iterations n’, error(iter), iter);
48 if error(iter) 1e-1
49 break
50 end
51
52 end
53 figure;
54 imshow(X’)
Listing 3: four neighbour rule.m Code for finding the four neighbors
1 function neighbors = four_neighbour_rule(X, i, j)
2 [row, col] = size(X); neighbors = [];
3 if 2= i
4 neighbors = [neighbors X(i-1,j)];
5 end
6 if i = (row - 1)
7 neighbors = [neighbors X(i+1, j)];
8 end
9
10 if j = 2
11 neighbors = [neighbors X(i, j-1)];
12 end
13 if j = (col - 1)
14 neighbors = [neighbors X(i, j+1)];
15 end
16 end
11

Listing 4: joint potential.m Code for finding the joint potential function
1 function net_energy = joint_potential(X, i, j, realization, Y, Parameters)
2
3 mu_0 = Parameters.mu0; sigma_0 = Parameters.sigma0;
4 mu_255 = Parameters.mu255; sigma_255 = Parameters.sigma255;
5 alpha = Parameters.alpha; beta = Parameters.beta;
6 neighbor = four_neighbour_rule(X,i,j);
7
8 if realization == 0
9 E_x0_x = sum(neighbor˜= 0);
10 E_x0_y = -log( 1/(sqrt(2*pi)*sigma_0)*exp((Y(i,j)- mu_0)ˆ2/(-2*sigma_0ˆ2)) );
11 net_energy = alpha * E_x0_y + beta* E_x0_x;
12
13 elseif realization == 255
14 E_x255_x = sum(neighbor˜= 255);
15 E_x255_y = -log( 1/(sqrt(2*pi)*sigma_255)* exp((Y(i,j) - mu_255)ˆ2/(-2*←-
sigma_255ˆ2)) );
16 net_energy = alpha * E_x255_y + beta* E_x255_x;
17 end
18 end
References
12

Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications

More Related Content

Tutorial on Markov Random Fields (MRFs) for Computer Vision Applications