SlideShare a Scribd company logo
A Project Report On
SIMULATING A FEED FORWARD ARTIFICIAL NEURAL
NETWORK IN C++
Submitted in partial fulfilment of the requirements
For the award of degree
Of
INTEGRATED DUAL DEGREE
In
COMPUTER SCIENCE AND ENGINEERING
(With Specialization in Information Technology)
Submitted by
Vaibhav Dhattarwal
CSE-IDD
Enrolment No: 08211018
Under the guidance of
DR. DURGA TOSHINWAL
Professor
ELECTRONICS AND COMPUTER ENGINEERING DEPARTMENT
INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
ROORKEE-247667
OCTOBER 2012
Abstract
This report presents an overview of how a feed forward artificial neural network was
implemented in C++. An Artificial neural network is a system composed of many simple
processing elements operating in parallel whose function is determined by network structure,
connection strengths, and the processing performed at computing elements or nodes. A neural
network is a massively parallel distributed processor that has a natural inclination for storing
experiential knowledge and making it available for use. This report also provides a brief
overview of artificial neural networks and questions their practical applicability. This is
followed by a detailed explanation of the design and implementation of a three-layer feed
forward neural network using back propagation algorithm.
Table of Contents
Page
Abstract i
Table of Contents ii
List of Figures iii
Chapter 1 Introduction 1
1.1 Objective of Project 2
Chapter 2 Artificial Neural Network 3
2.1 Neural Network Definition 3
2.2 Neural Network Applications 5
2.3 Neural Network Categorization 6
2.4 Types of Neural Network 8
Chapter 3 Design 10
3.1 Back Propagation Algorithm 10
3.2 Pseudo Code for One Layer 11
3.3 Pseudo Code for all the layers 13
Chapter 4 Implementation 15
4.1 Pseudo Code for training patterns 15
4.2 Pseudo Code for minimizing error 16
Chapter 5 Results 19
References 20
List of Figures
Figure Title Page
2.1 an Artificial Neural Network 3
2.2 the sigmoid curve 6
3.1 the design for calculating output activation 8
3.2 Output Screenshot 9
1 Introduction
An Artificial Neural Network (ANN), usually called neural network (NN), is a mathematical
model or computational model that is inspired by the structure and functional aspects of
biological neural networks. A neural network consists of an interconnected group of artificial
neurons, and it processes information using a connection based approach to computation. In
most cases an ANN is an adaptive system that changes its structure based on external or
internal information that flows through the network during the learning phase. Modern neural
networks are non-linear statistical data modelling tools. They are usually used to model
complex relationships between inputs and outputs or to find patterns in data.
Neural network is a set of connected input/output units and each connection has a weight
present with it. During the learning phase, network learns by adjusting weights so as to
predict the correct class labels of the input tuples. Neural networks have the remarkable
ability to derive meaning from complicated or imprecise data and can be used to extract
patterns and detect trends that are too complex to be noticed by either humans or other
computer techniques. These are well suited for continuous valued inputs and outputs. Neural
networks are best at identifying patterns or trends in data and well suited for prediction or
forecasting needs.
Neural networks are non-linear statistical data modelling tools. They can be used to model
complex relationships between inputs and outputs; or to find patterns in data and to infer rules
from them. Neural networks are useful in providing information on associations,
classifications, clusters, and forecasting. Using neural networks as a tool, data warehousing
firms can harvest information from datasets in the data mining process. Neural networks are
programmed to store, recognize, and associatively retrieve patterns or database entries; to
solve combinatorial optimization problems; to filter noise from measurement data; to control
ill-defined problems; in summary, to estimate sampled functions when we do not know the
form of the functions. The two abilities: pattern recognition and function estimation make
neural networks a very prevalent utility in data mining. With their model-free estimators and
their dual nature, neural networks serve data mining in a variety of ways.
Neural networks, depending on the architecture, provide associations, classifications, clusters,
prediction and forecasting to the data mining industry. Neural networks essentially comprise
three pieces: the architecture or model; the learning algorithm; and the activation functions.
Due to neural networks, we can mine valuable information from a mass of history information
so that it can be efficiently used in financial areas. Hence, the applications of neural networks
in financial forecasting have become very popular.
1.1 Objective of the Project
The introduction of Artificial Neural Networks and a description of the Neural Networks are
presented in this project report. The objective of this project is to implement a Feed Forward
Artificial Neural Network in C++ using the back propagation algorithm. The design of this
simulation has also been discussed followed by an explanation of the implementation of the
Network. The results of the output program will also be included in this project.
2 Artificial Neural Network
Figure 2.1 an Artificial Neural Network
First of all, when we are talking about a neural network, we should more properly say
"artificial neural network" (ANN), because that is what we mean most of the time in this
project. Biological neural networks are much more complicated than the mathematical
models we use for ANNs. But it is customary to be lazy and drop the "A" or the "artificial".
2.1 Neural Network Definition
There is no universally accepted definition of an NN. But perhaps most people in the field
would agree that an NN is a network of many simple processors ("units"), each possibly
having a small amount of local memory. The units are connected by communication channels
("connections") which usually carry numeric data, encoded by any of various means. The
units operate only on their local data and on the inputs they receive via the connections. The
restriction to local operations is often relaxed during training.
Some NNs are models of biological neural networks and some are not, but historically, much
of the inspiration for the field of NNs came from the desire to produce artificial systems
capable of sophisticated, perhaps "intelligent", computations similar to those that the human
brain routinely performs, and thereby possibly to enhance our understanding of the human
brain.
Most NNs have some sort of "training" rule whereby the weights of connections are adjusted
on the basis of data. In other words, NNs "learn" from examples, as children learn to
distinguish dogs from cats based on examples of dogs and cats. If trained carefully, NNs may
exhibit some capability for generalization beyond the training data, that is, to produce
approximately correct results for new cases that were not used for training.
NNs normally have great potential for parallelism, since the computations of the components
are largely independent of each other. Some people regard massive parallelism and high
connectivity to be defining characteristics of NNs, but such requirements rule out various
simple models, such as simple linear regression (a minimal feed forward net with only two
units plus bias), which are usefully regarded as special cases of NNs.
Some popular descriptive definitions of Neural Networks
 A neural network is a system composed of many simple processing elements
operating in parallel whose function is determined by network structure, connection
strengths, and the processing performed at computing elements or nodes. A neural
network is a massively parallel distributed processor that has a natural propensity for
storing experiential knowledge and making it available for use. It resembles the brain
in two respects:
1. Knowledge is acquired by the network through a learning process.
2. Interneuron connection strengths known as synaptic weights are used to store the
knowledge.
 A neural network is a circuit composed of a very large number of simple processing
elements that are neural based. Each element operates only on local information.
Furthermore each element operates asynchronously; thus there is no overall system
clock.
 Artificial neural systems, or neural networks, are physical cellular systems which can
acquire, store, and utilize experiential knowledge.
2.2 Neural Network Applications
Practical applications of NNs most often employ supervised learning. For supervised
learning, you must provide training data that includes both the input and the desired result
(the target value). After successful training, you can present input data alone to the NN (that
is, input data without the desired result), and the NN will compute an output value that
approximates the desired result. However, for training to be successful, you may need lots of
training data and lots of computer time to do the training. In many applications, such as
image and text processing, you will have to do a lot of work to select appropriate input data
and to code the data as numeric values.
In practice, NNs are especially useful for classification and function approximation/mapping
problems which are tolerant of some imprecision, which have lots of training data available,
but to which hard and fast rules (such as those that might be used in an expert system) cannot
easily be applied. Almost any finite-dimensional vector function on a compact set can be
approximated to arbitrary precision by feed forward NNs (which are the type most often used
in practical applications) if you have enough data and enough computing resources.
In principle, NNs can compute any computable function, i.e., they can do everything a
normal digital computer can do, or perhaps even more, under some assumptions of doubtful
practicality.
Neural Networks are interesting for quite a lot of very different people:
 Computer scientists want to find out about the properties of non-symbolic information
processing with neural nets and about learning systems in general.
 Statisticians use neural nets as flexible, nonlinear regression and classification
models.
 Engineers of many kinds exploit the capabilities of neural networks in many areas,
such as signal processing and automatic control.
 Cognitive scientists view neural networks as a possible apparatus to describe models
of thinking and consciousness (High-level brain function).
 Neurophysiologists use neural networks to describe and explore medium-level brain
function (e.g. memory, sensory system, and motorics).
 Physicists use neural networks to model phenomena in statistical mechanics and for a
lot of other tasks.
 Biologists use Neural Networks to interpret nucleotide sequences.
 Philosophers and some other people may also be interested in Neural Networks for
various reasons.
2.3 Neural Network Categorization
There are many kinds of NNs by now. Nobody knows exactly how many. New ones (or at
least variations of old ones) are invented every week. Below is a collection of some of the
most well known methods:
The two main kinds of learning algorithms are supervised and unsupervised.
 In supervised learning, the correct results (target values, desired outputs) are known
and are given to the NN during training so that the NN can adjust its weights to try
matching its outputs to the target values. After training, the NN is tested by giving it
only input values, not target values, and seeing how close it comes to outputting the
correct target values.
 In unsupervised learning, the NN is not provided with the correct results during
training. Unsupervised NNs usually perform some kind of data compression, such as
dimensionality reduction or clustering.
The distinction between supervised and unsupervised methods is not always clear-cut. An
unsupervised method can learn a summary of a probability distribution, then that summarized
distribution can be used to make predictions. Furthermore, supervised methods come in two
sub varieties: auto-associative and hetero-associative. In auto-associative learning, the target
values are the same as the inputs, whereas in hetero-associative learning, the targets are
generally different from the inputs. Many unsupervised methods are equivalent to auto-
associative supervised methods.
Two major kinds of network topology are feed forward and feedback.
 In a feed forward NN, the connections between units do not form cycles. Feed
forward NNs usually produce a response to an input quickly. Most Feed forward NNs
can be trained using a wide variety of efficient conventional numerical methods in
addition to algorithms invented by NN researchers.
 In a feedback or recurrent NN, there are cycles in the connections. In some
feedback NNs, each time an input is presented, the NN must iterate for a potentially
long time before it produces a response. Feedback NNs are usually more difficult to
train than Feed forward NNs.
Some kinds of NNs can be implemented as either Feed forward or feedback networks.
NNs also differ in the kinds of data they accept. Two major kinds of data are categorical and
quantitative.
 Categorical variables take only a finite (technically, countable) number of possible
values, and there are usually several or more cases falling into each category.
Categorical variables may have symbolic values (e.g., "male" and "female", or "red",
"green" and "blue") that must be encoded into numbers before being given to the
network. Both supervised learning with categorical target values and unsupervised
learning with categorical outputs are called "classification."
 Quantitative variables are numerical measurements of some attribute, such as length
in meters. The measurements must be made in such a way that at least some
arithmetic relations among the measurements reflect analogous relations among the
attributes of the objects that are measured. Supervised learning with quantitative
target values is called "regression."
Some variables can be treated as either categorical or quantitative, such as number of children
or any binary variable. Most regression algorithms can also be used for supervised
classification by encoding categorical target values as 0/1 binary variables and using those
binary variables as target values for the regression algorithm. The outputs of the network are
posterior probabilities when any of the most common training methods are used.
2.4 Types of Neural Network
Here are some well-known kinds of Neural Networks:
A. Supervised
1. Feed forward
 Linear
 Hebbian
 Perceptron
 Adaline
 Higher Order
 Functional Link
 MLP: Multilayer perceptron
 Backprop
 Cascade Correlation
 Quickprop
 RPROP
 RBF networks
 OLS: Orthogonal Least Squares
 CMAC: Cerebellar Model Articulation Controller
 Classification only
 LVQ: Learning Vector Quantization
 PNN: Probabilistic Neural Network
 Regression only
 GNN: General Regression Neural Network
2. Feedback
 BAM: Bidirectional Associative Memory
 Boltzman Machine
 Recurrent time series
 Back propagation through time
 Elman
 FIR: Finite Impulse Response
 Jordan
 Real-time recurrent network
 Recurrent back propagation
 TDNN: Time Delay NN
3. Competitive
 ARTMAP
 Fuzzy ARTMAP
 Gaussian ARTMAP
 Counter propagation
 Neocognitron
B. Unsupervised
1. Competitive
 Vector Quantization
 Grossberg
 Kohonen
 Conscience
 Self-Organizing Map
 Kohonen
 GTM:
 Local Linear
 Adaptive resonance theory
 ART 1
 ART 2
 ART 2-A
 ART 3
 Fuzzy ART
 DCL: Differential Competitive Learning
2. Dimension Reduction
 Hebbian
 Oja
 Sanger
 Differential Hebbian
3. Auto association
 Linear autoassociator
 BSB: Brain State in a Box
 Hopfield
3 Design
The simplified process for training a Feed Forward Neural Network is as follows:
1. Input data is presented to the network and propagated through the network until it
reaches the output layer. This forward process produces a predicted output.
2. The predicted output is subtracted from the actual output and an error value for the
networks is calculated.
3. The neural network then uses supervised learning, which in most cases is back
propagation, to train the network. Back propagation is a learning algorithm for
adjusting the weights. It starts with the weights between the output layer PE’s and
the last hidden layer PE’s and works backwards through the network.
4. Once back propagation has finished, the forward process starts again, and this cycle
is continued until the error between predicted and actual outputs is minimized.
3.1. The Back Propagation Algorithm:
Back propagation, or propagation of error, is a common method of teaching artificial neural
networks how to perform a given task. Back propagation is the method of training artificial
neural networks so as to minimize the objective function. The back propagation algorithm
performs learning on a feed-forward neural network. The back propagation algorithm is used
in layered feed forward ANNs. This means that the artificial neurons are organized in layers,
and send their signals “forward”, and then the errors are propagated backwards. The back
propagation algorithm uses supervised learning, which means that we provide the algorithm
with examples of the inputs and outputs we want the network to compute, and then the error
(difference between actual and expected results) is calculated. The idea of the back
propagation algorithm is to reduce this error, until the ANN learns the training data.
Algorithm for a 3-layer network:
1. Initialize the weights in the network
2. Do
a. For each example E in the training set
oNeural-net-output (network, E); forward pass
oT = teacher output for E
oCalculate error (T - O) at the output units
oCompute ΔWho for all weights from hidden layer to output layer;
oBackward pass
oCompute ΔWih for all weights from input layer to hidden layer;
oBackward pass continued
oUpdate the weights in the network
3. Until all examples classified correctly or stopping criterion satisfied
4. Return the network
The Back Propagation learning algorithm can be divided into two phases:
Phase 1: Propagation
This phase involves the following steps:
1. Forward propagation of a training pattern's input through the neural network.
2. Backward propagation of the propagation's output activations through the neural
network using the training pattern's target.
Phase 2: Weight update
For each weight-synapse the following steps are used:
1. Multiply its output delta and input activation to get the gradient of the weight.
2. Bring the weight in the opposite direction of the gradient by subtracting a ratio of it
from the weight.
Repeat phase 1 and 2 until the performance of the network is satisfactory.
3.2 Pseudo Code for one Layer
A single neuron (i.e. processing element) takes in total input PEinput and produces output
activation PEout. In this project, we are taking the activation function as Sigmoid Function.
Hence we can consider the out PEout=Sigmoid(PEinput). Sigmoid function refers to the special
case of the logistic function shown below and defined by the formula
Figure 3.1 the sigmoid curve
Though other activation functions are often used (e.g. linear or hyperbolic tangent). This has
the effect of squashing the infinite range of PEinput into the range 0 to 1. It also has the
convenient property that its derivative takes the particularly simple form
dS
dt
= S ∗ (1 − S)
Typically, the input PEinput into a given neuron will be the weighted sum of output activations
feeding in from a number of other neurons. It is convenient to think of the activations flowing
through layers of neurons. So, if there are NumUnitLayer1 neurons in layer 1, the total
activation flowing into our layer 2 neuron is the sum over the product OutputLayer1[i]*Wt[i],
where Wt[i] is the strength/weight of the connection between PE[i] in layer 1 and our PE in
layer 2. Each neuron will also have a bias, or resting state, that is added to the sum of inputs,
and it is convenient to call this Wt[0]. We can then write
InputLayer2 = Wt[0] // consider the resting state bias weight //
for( i = 1 | i < = NumUnitLayer1 | i++ ) // setting loop condition //
{
Add to InputLayer2 the sum over the product OutputLayer1[i] * Wt[i]
}
Compute the sigmoid OutputLayer2 =
1
1+e−Input Layer2 to get activation
output
Similarly layer 2 will have many processing elements as well, so it is appropriate to write the
weights between PE[i] in layer 1 and PE[j] in layer 2 as a two dimensional array Wt[i][j].
Thus to get the output of PE[j] in layer 2 we have
InputLayer2[j] = Wt[0][j]
For ( i = 1 | i < = NumUnitLayer1 | i++ )
{
Add to InputLayer2[j] the sum over the product OutputLayer1[i] * Wt[i][j]
}
Compute the sigmoid OutputLayer2[j] =
1
1+e−Input Layer2[j] to get
activation output
Now we know that Layer 2 has number of processing units given by NumUnitLayer2 and the
above code calculates the output for only one processing element PE[j]. However we require
the output for all the processing elements in Layer 2. Hence we introduce another loop to get
all the layer 2 outputs
For ( j = 1 | j < = NumUnitLayer2 | j++ )
{
InputLayer2[j] = Wt[0][j]
For ( i = 1 | i < = NumUnitLayer1 | i++ )
{
Add to InputLayer2[j] the sum over the product OutputLayer1[i] * Wt[i][j]
}
Compute sigmoid OutputLayer2[j] =
1
1+e−Input Layer2[j] for output
}
3.3 Pseudo Code for all Layers
Now that we have calculated the output for all the processing elements in one layer, we can
look at writing the code which calculates the output for all the layers in our network. Three
layer networks are necessary and sufficient for most purposes, so our layer 2 outputs feed into
a third layer in the same way as the above cases. The feed forward neural network chosen for
this project has three layers 1, 2, 3 and here is the calculation of output for all three layers of
the network
For ( j = 1 | j < = NumUnitLayer2 | j++ ) // computes Layer 2 outputs //
{
InputLayer2[j] = WtLayer1/Layer2[0][j]
For ( i = 1 | i < = NumUnitLayer1 | i++ )
{
Add to InputLayer2[j] the sum over OutputLayer1[i] * WtLayer1/Layer2 [i][j]
}
Compute sigmoid OutputLayer2[j] =
1
1+e−Input Layer2[j] for output
}
For ( k = 1 | k < = NumUnitLayer3 | k++ ) // computes Layer 3 outputs //
{
InputLayer3[k] = WtLayer2/Layer3[0][k]
For ( j = 1 | j < = NumUnitLayer2 | j++ )
{
Add to InputLayer3[k] the sum over OutputLayer2[j] * WtLayer2/Layer3 [j][k]
}
Compute sigmoid OutputLayer3[k] =
1
1+e−Input Layer3[k] for output
}
To avoid confusion in the pseudo code there is a different index for each layer: i, j, k
for Layers 1, 2, 3 respectively. Weights for connections are also different for
distinguishing between the different layers, WtLayer1/Layer2 and WtLayer2/Layer3. For
obvious reasons, for three layer networks, it is traditional to call layer 1
the Input layer, layer 2 the Hidden layer, and layer 3 the Output layer. The neural
network in this project has a design similar to the figure shown below.
Figure 3.2 the design for calculating output activation
Now we can denote the layers 1, 2, 3 as input layer, hidden layer, and output layer
respectively. The weights for the connections have also been denoted appropriately. As
shown in the above figure, the initial bias weights are also included in the input for each layer
and consequently the output also.
For ( j = 1 | j < = NumUnitHidden | j++ ) // computes Hidden Layer PE outputs //
{
InputHidden[j] = WtInput/Hidden[0][j]
For ( i = 1 | i < = NumUnitInput | i++ )
{
Add to InputHidden[j] the sum over OutputInput[i] * WtInput/Hidden [i][j]
}
Compute sigmoid OutputHidden[j] =
1
1+e−Input Hidden [j] for output
}
For ( k = 1 | k < = NumUnitOuput | k++ ) // computes Output Layer PE outputs //
{
InputOutput[k] = WtHidden/Output[0][k]
For ( j = 1 | j < = NumUnitHidden | j++ )
{
Add to InputOutput [k] sum over OutputHidden[j] * WtHidden/Output [j][k]
}
Compute sigmoid Output[k] =
1
1+e−Input Output[k] for output
}
4 Implementation
4.1 Pseudo Code for training patterns
In this project, there will be a whole set of training patterns(NumExamples), i.e. pairs of input
and target output vectors,
Input[E][i] , Target[E][k]
labelled by the index E. The network learns by minimizing some measure of the error of the
network's actual outputs compared with the target outputs. The sum squared error for all the
output units, denoted by k and all training patterns, denoted by E will be given by
Error = 0.0 ;
For ( E= 1 | E < = NumUnitHidden | E++ )
{
For ( k = 1 | k < = NumUnitOuput | k++ )
{
Add to Error the sum over the product 0.5 * (Target[E][k] -
Output[E][k]) * (Target[E][k] - Output[E][k]) ;
}
}
The factor of 0.5 is conventionally included to simplify the algebra in deriving the learning
algorithm. If we insert the above code for computing the network outputs into the E loop of
this, we end up with
Error = 0.0 ;
For ( E= 1 | E < = NumUnitHidden | E++ )
{ // computes for all training patterns(E) //
For ( j = 1 | j < = NumUnitHidden | j++ )
{
InputHidden[E][j] = WtInput/Hidden[0][j]
For ( i = 1 | i < = NumUnitInput | i++ )
{
Add to InputHidden[E] [j] the sum over OutputInput[E] [i] *
WtInput/Hidden [i][j]
}
Compute sigmoid OutputHidden[E][j] =
1
1+e−Input Hidden[E][j] for
output
}
For ( k = 1 | k < = NumUnitOuput | k++ )
{
InputOutput[E] [k] = WtHidden/Output[0][k]
For ( j = 1 | j < = NumUnitHidden | j++ )
{
Add to InputOutput [E] [k] sum over OutputHidden[E] [j] *
WtHidden/Output [j][k]
}
Compute sigmoid Output[E][k] =
1
1+e−Input Output[E][k] for output
Add to Error the sum over the product 0.5 * (Target[E][k] -
Output[E][k]) * (Target[E][k] - Output[E][k])
}
}
4.2 Pseudo Code for minimizing error
The next stage of the project involves iteratively adjusting the weights to minimize the
network's error. The method adopted in this project is by 'gradient descent' on the error
function. We can compute how much the error is changed by a small change in each weight
(i.e. compute the partial derivatives dError/dWt) and shift the weights by a small amount in
the direction that reduces the error. As stated before, we use the back-propagation algorithm.
After the calculation of the above sum squared error, we can compute and apply one iteration
(or 'epoch') of the required weight changes ΔWho and ΔWih using
Error = 0.0 ;
For ( E= 1 | E < = NumUnitHidden | E++ )
{ // computes for all training patterns(E) //
For ( j = 1 | j < = NumUnitHidden | j++ )
{
InputHidden[E][j] = WtInput/Hidden[0][j]
For ( i = 1 | i < = NumUnitInput | i++ )
{
Add to InputHidden[E] [j] the sum over OutputInput[E] [i] *
WtInput/Hidden [i][j]
}
Compute sigmoid OutputHidden[E][j] =
1
1+e−Input Hidden[E][j] for
output
}
For ( k = 1 | k < = NumUnitOuput | k++ )
{
InputOutput[E] [k] = WtHidden/Output[0][k]
For ( j = 1 | j < = NumUnitHidden | j++ )
{
Add to InputOutput [E] [k] sum over OutputHidden[E] [j] *
WtHidden/Output [j][k]
}
Compute sigmoid Output[E][k] =
1
1+e−Input Output[E][k] for output
Add to Error the sum over the product 0.5 * (Target[E][k] -
Output[E][k]) * (Target[E][k] - Output[E][k]) ;
ΔOutput[k] = (Target[E][k] - Output[E][k]) * Output[E][k] * (1 -
Output[E][k]) // derivative of the function //
}
For ( j = 1 | j < = NumUnitHidden | j++ )
{ // Back Propagation of error to hidden layer //
Sum of ΔOutput [j] = 0.0
For ( k = 1 | k < = NumUnitOuput | k++ )
{
Add to Sum of ΔOutput [j] the sum over the product
WtHidden/Output [j][k] * ΔOutput [k] ;
}
ΔH[j] = Sum of ΔOutput [j] * OutputHidden [E][j] * (1.0 - OutputHidden
[E][j]) // derivative of the function //
}
For ( j = 1 | j < = NumUnitHidden | j++ )
{ // This loop updates the weight input to hidden //
Add to ΔWih [0][j] the sum of: product β * ΔH [j] to the product: α *
ΔWih [0][j]
Add to WtInput/Hidden [0][j] the change ΔWih [0][j]
For ( i = 1 | i < = NumUnitInput | i++ )
{
Add to ΔWih [i][j] the sum of product β * InputHidden [p][i] * ΔH
[j] to the product: α * ΔWih [i][j]
Add to WtInput/Hidden [i][j] the change ΔWih [i][j]
}
}
For ( k = 1 | k < = NumUnitOuput | k++ )
{ // This loop updates the weight hidden to output //
Add to ΔWho [0][k] the sum of: product β * ΔOutput[k] to the product:
α * ΔWho [0][k]
Add to WtHidden/Output [0][k] the change ΔWho [0][k]
For ( j = 1 | j < = NumUnitHidden | j++ )
{
Add to ΔWho [j][k] the sum of product β * OutputHidden [p][j] *
ΔOutput [k] to the product: α *ΔWho [j][k]
Add to WtHidden/Output [j][k] the change ΔWho [j][k]
}
}
}
The weight changes ΔWih and ΔWho are each made up of two components. First,
the beta component that is the gradient descent contribution. Second, the alpha component is
a 'momentum' term which effectively keeps a moving average of the gradient descent weight
change contributions, and thus smoothes out the overall weight changes.
The complete training process will consist of repeating the above weight updates for a
number of epochs until some error criterion is met.
5 Results
Figure 5.1 Output Screenshot
The program based on the design discussed in the previous section was executed
successfully. The pseudo code was successfully implemented and the three layered feed
forward neural network was simulated on the basis of the back propagation algorithm.
6 References
[1] Pinkus, A. (1999), "Approximation theory of the MLP model in neural networks,"
Acta Numerica, 8, 143-196.
[2] Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, NY: Macmillan.
[3] Nigrin, A. (1993), Neural Networks for Pattern Recognition, Cambridge, MA: The
MIT Press.
[4] Zurada, J.M. (1992), Introduction To Artificial Neural Systems, Boston: PWS
Publishing Company.
[5] Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford
University Press.
[6] Cichocki, A. and Unbehauen, R. (1993). Neural Networks for Optimization and
Signal Processing. NY: John Wiley & Sons, ISBN 0-471-93010-5.
[7] Diamantaras, K.I., and Kung, S.Y. (1996) Principal Component Neural Networks:
Theory and Applications, NY: Wiley.
[8] Fausett, L. (1994), Fundamentals of Neural Networks, Englewood Cliffs, NJ: Prentice
Hall.
[9] Kosko, B.(1992), Neural Networks and Fuzzy Systems, Englewood Cliffs, N.J.:
Prentice-Hall.
[10] Masters, T. (1993). Practical Neural Network Recipes in C++, San Diego: Academic
Press.
[11] Masters, T. (1995) Advanced Algorithms for Neural Networks: A C++
Sourcebook, NY: John Wiley and Sons, ISBN 0-471-10588-0
[12] Oja, E. (1989), "Neural networks, principal components, and subspaces,"
International Journal of Neural Systems, 1, 61-68.
[13] Pao, Y. H. (1989), Adaptive Pattern Recognition and Neural Networks, Reading, MA:
Addison-Wesley Publishing Company, ISBN 0-201-12584-6.
[14] Reed, R.D., and Marks, R.J, II (1999), Neural Smithing: Supervised Learning in Feed
forward Artificial Neural Networks, Cambridge, MA: The MIT Press, ISBN 0-262-
18190-8.
[15] Sanger, T.D. (1989), "Optimal unsupervised learning in a single-layer linear Feed
forward neural network," Neural Networks, 2, 459-473.

More Related Content

Project Report -Vaibhav

  • 1. A Project Report On SIMULATING A FEED FORWARD ARTIFICIAL NEURAL NETWORK IN C++ Submitted in partial fulfilment of the requirements For the award of degree Of INTEGRATED DUAL DEGREE In COMPUTER SCIENCE AND ENGINEERING (With Specialization in Information Technology) Submitted by Vaibhav Dhattarwal CSE-IDD Enrolment No: 08211018 Under the guidance of DR. DURGA TOSHINWAL Professor ELECTRONICS AND COMPUTER ENGINEERING DEPARTMENT INDIAN INSTITUTE OF TECHNOLOGY ROORKEE ROORKEE-247667 OCTOBER 2012
  • 2. Abstract This report presents an overview of how a feed forward artificial neural network was implemented in C++. An Artificial neural network is a system composed of many simple processing elements operating in parallel whose function is determined by network structure, connection strengths, and the processing performed at computing elements or nodes. A neural network is a massively parallel distributed processor that has a natural inclination for storing experiential knowledge and making it available for use. This report also provides a brief overview of artificial neural networks and questions their practical applicability. This is followed by a detailed explanation of the design and implementation of a three-layer feed forward neural network using back propagation algorithm.
  • 3. Table of Contents Page Abstract i Table of Contents ii List of Figures iii Chapter 1 Introduction 1 1.1 Objective of Project 2 Chapter 2 Artificial Neural Network 3 2.1 Neural Network Definition 3 2.2 Neural Network Applications 5 2.3 Neural Network Categorization 6 2.4 Types of Neural Network 8 Chapter 3 Design 10 3.1 Back Propagation Algorithm 10 3.2 Pseudo Code for One Layer 11 3.3 Pseudo Code for all the layers 13 Chapter 4 Implementation 15 4.1 Pseudo Code for training patterns 15 4.2 Pseudo Code for minimizing error 16 Chapter 5 Results 19 References 20
  • 4. List of Figures Figure Title Page 2.1 an Artificial Neural Network 3 2.2 the sigmoid curve 6 3.1 the design for calculating output activation 8 3.2 Output Screenshot 9
  • 5. 1 Introduction An Artificial Neural Network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes information using a connection based approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase. Modern neural networks are non-linear statistical data modelling tools. They are usually used to model complex relationships between inputs and outputs or to find patterns in data. Neural network is a set of connected input/output units and each connection has a weight present with it. During the learning phase, network learns by adjusting weights so as to predict the correct class labels of the input tuples. Neural networks have the remarkable ability to derive meaning from complicated or imprecise data and can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. These are well suited for continuous valued inputs and outputs. Neural networks are best at identifying patterns or trends in data and well suited for prediction or forecasting needs. Neural networks are non-linear statistical data modelling tools. They can be used to model complex relationships between inputs and outputs; or to find patterns in data and to infer rules from them. Neural networks are useful in providing information on associations, classifications, clusters, and forecasting. Using neural networks as a tool, data warehousing firms can harvest information from datasets in the data mining process. Neural networks are programmed to store, recognize, and associatively retrieve patterns or database entries; to solve combinatorial optimization problems; to filter noise from measurement data; to control ill-defined problems; in summary, to estimate sampled functions when we do not know the form of the functions. The two abilities: pattern recognition and function estimation make neural networks a very prevalent utility in data mining. With their model-free estimators and their dual nature, neural networks serve data mining in a variety of ways.
  • 6. Neural networks, depending on the architecture, provide associations, classifications, clusters, prediction and forecasting to the data mining industry. Neural networks essentially comprise three pieces: the architecture or model; the learning algorithm; and the activation functions. Due to neural networks, we can mine valuable information from a mass of history information so that it can be efficiently used in financial areas. Hence, the applications of neural networks in financial forecasting have become very popular. 1.1 Objective of the Project The introduction of Artificial Neural Networks and a description of the Neural Networks are presented in this project report. The objective of this project is to implement a Feed Forward Artificial Neural Network in C++ using the back propagation algorithm. The design of this simulation has also been discussed followed by an explanation of the implementation of the Network. The results of the output program will also be included in this project.
  • 7. 2 Artificial Neural Network Figure 2.1 an Artificial Neural Network First of all, when we are talking about a neural network, we should more properly say "artificial neural network" (ANN), because that is what we mean most of the time in this project. Biological neural networks are much more complicated than the mathematical models we use for ANNs. But it is customary to be lazy and drop the "A" or the "artificial". 2.1 Neural Network Definition There is no universally accepted definition of an NN. But perhaps most people in the field would agree that an NN is a network of many simple processors ("units"), each possibly having a small amount of local memory. The units are connected by communication channels ("connections") which usually carry numeric data, encoded by any of various means. The units operate only on their local data and on the inputs they receive via the connections. The restriction to local operations is often relaxed during training.
  • 8. Some NNs are models of biological neural networks and some are not, but historically, much of the inspiration for the field of NNs came from the desire to produce artificial systems capable of sophisticated, perhaps "intelligent", computations similar to those that the human brain routinely performs, and thereby possibly to enhance our understanding of the human brain. Most NNs have some sort of "training" rule whereby the weights of connections are adjusted on the basis of data. In other words, NNs "learn" from examples, as children learn to distinguish dogs from cats based on examples of dogs and cats. If trained carefully, NNs may exhibit some capability for generalization beyond the training data, that is, to produce approximately correct results for new cases that were not used for training. NNs normally have great potential for parallelism, since the computations of the components are largely independent of each other. Some people regard massive parallelism and high connectivity to be defining characteristics of NNs, but such requirements rule out various simple models, such as simple linear regression (a minimal feed forward net with only two units plus bias), which are usefully regarded as special cases of NNs. Some popular descriptive definitions of Neural Networks  A neural network is a system composed of many simple processing elements operating in parallel whose function is determined by network structure, connection strengths, and the processing performed at computing elements or nodes. A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects: 1. Knowledge is acquired by the network through a learning process. 2. Interneuron connection strengths known as synaptic weights are used to store the knowledge.  A neural network is a circuit composed of a very large number of simple processing elements that are neural based. Each element operates only on local information. Furthermore each element operates asynchronously; thus there is no overall system clock.
  • 9.  Artificial neural systems, or neural networks, are physical cellular systems which can acquire, store, and utilize experiential knowledge. 2.2 Neural Network Applications Practical applications of NNs most often employ supervised learning. For supervised learning, you must provide training data that includes both the input and the desired result (the target value). After successful training, you can present input data alone to the NN (that is, input data without the desired result), and the NN will compute an output value that approximates the desired result. However, for training to be successful, you may need lots of training data and lots of computer time to do the training. In many applications, such as image and text processing, you will have to do a lot of work to select appropriate input data and to code the data as numeric values. In practice, NNs are especially useful for classification and function approximation/mapping problems which are tolerant of some imprecision, which have lots of training data available, but to which hard and fast rules (such as those that might be used in an expert system) cannot easily be applied. Almost any finite-dimensional vector function on a compact set can be approximated to arbitrary precision by feed forward NNs (which are the type most often used in practical applications) if you have enough data and enough computing resources. In principle, NNs can compute any computable function, i.e., they can do everything a normal digital computer can do, or perhaps even more, under some assumptions of doubtful practicality. Neural Networks are interesting for quite a lot of very different people:  Computer scientists want to find out about the properties of non-symbolic information processing with neural nets and about learning systems in general.  Statisticians use neural nets as flexible, nonlinear regression and classification models.  Engineers of many kinds exploit the capabilities of neural networks in many areas, such as signal processing and automatic control.
  • 10.  Cognitive scientists view neural networks as a possible apparatus to describe models of thinking and consciousness (High-level brain function).  Neurophysiologists use neural networks to describe and explore medium-level brain function (e.g. memory, sensory system, and motorics).  Physicists use neural networks to model phenomena in statistical mechanics and for a lot of other tasks.  Biologists use Neural Networks to interpret nucleotide sequences.  Philosophers and some other people may also be interested in Neural Networks for various reasons. 2.3 Neural Network Categorization There are many kinds of NNs by now. Nobody knows exactly how many. New ones (or at least variations of old ones) are invented every week. Below is a collection of some of the most well known methods: The two main kinds of learning algorithms are supervised and unsupervised.  In supervised learning, the correct results (target values, desired outputs) are known and are given to the NN during training so that the NN can adjust its weights to try matching its outputs to the target values. After training, the NN is tested by giving it only input values, not target values, and seeing how close it comes to outputting the correct target values.  In unsupervised learning, the NN is not provided with the correct results during training. Unsupervised NNs usually perform some kind of data compression, such as dimensionality reduction or clustering. The distinction between supervised and unsupervised methods is not always clear-cut. An unsupervised method can learn a summary of a probability distribution, then that summarized distribution can be used to make predictions. Furthermore, supervised methods come in two sub varieties: auto-associative and hetero-associative. In auto-associative learning, the target values are the same as the inputs, whereas in hetero-associative learning, the targets are generally different from the inputs. Many unsupervised methods are equivalent to auto- associative supervised methods.
  • 11. Two major kinds of network topology are feed forward and feedback.  In a feed forward NN, the connections between units do not form cycles. Feed forward NNs usually produce a response to an input quickly. Most Feed forward NNs can be trained using a wide variety of efficient conventional numerical methods in addition to algorithms invented by NN researchers.  In a feedback or recurrent NN, there are cycles in the connections. In some feedback NNs, each time an input is presented, the NN must iterate for a potentially long time before it produces a response. Feedback NNs are usually more difficult to train than Feed forward NNs. Some kinds of NNs can be implemented as either Feed forward or feedback networks. NNs also differ in the kinds of data they accept. Two major kinds of data are categorical and quantitative.  Categorical variables take only a finite (technically, countable) number of possible values, and there are usually several or more cases falling into each category. Categorical variables may have symbolic values (e.g., "male" and "female", or "red", "green" and "blue") that must be encoded into numbers before being given to the network. Both supervised learning with categorical target values and unsupervised learning with categorical outputs are called "classification."  Quantitative variables are numerical measurements of some attribute, such as length in meters. The measurements must be made in such a way that at least some arithmetic relations among the measurements reflect analogous relations among the attributes of the objects that are measured. Supervised learning with quantitative target values is called "regression." Some variables can be treated as either categorical or quantitative, such as number of children or any binary variable. Most regression algorithms can also be used for supervised classification by encoding categorical target values as 0/1 binary variables and using those binary variables as target values for the regression algorithm. The outputs of the network are posterior probabilities when any of the most common training methods are used.
  • 12. 2.4 Types of Neural Network Here are some well-known kinds of Neural Networks: A. Supervised 1. Feed forward  Linear  Hebbian  Perceptron  Adaline  Higher Order  Functional Link  MLP: Multilayer perceptron  Backprop  Cascade Correlation  Quickprop  RPROP  RBF networks  OLS: Orthogonal Least Squares  CMAC: Cerebellar Model Articulation Controller  Classification only  LVQ: Learning Vector Quantization  PNN: Probabilistic Neural Network  Regression only  GNN: General Regression Neural Network 2. Feedback  BAM: Bidirectional Associative Memory  Boltzman Machine  Recurrent time series  Back propagation through time  Elman  FIR: Finite Impulse Response  Jordan  Real-time recurrent network
  • 13.  Recurrent back propagation  TDNN: Time Delay NN 3. Competitive  ARTMAP  Fuzzy ARTMAP  Gaussian ARTMAP  Counter propagation  Neocognitron B. Unsupervised 1. Competitive  Vector Quantization  Grossberg  Kohonen  Conscience  Self-Organizing Map  Kohonen  GTM:  Local Linear  Adaptive resonance theory  ART 1  ART 2  ART 2-A  ART 3  Fuzzy ART  DCL: Differential Competitive Learning 2. Dimension Reduction  Hebbian  Oja  Sanger  Differential Hebbian 3. Auto association  Linear autoassociator  BSB: Brain State in a Box  Hopfield
  • 14. 3 Design The simplified process for training a Feed Forward Neural Network is as follows: 1. Input data is presented to the network and propagated through the network until it reaches the output layer. This forward process produces a predicted output. 2. The predicted output is subtracted from the actual output and an error value for the networks is calculated. 3. The neural network then uses supervised learning, which in most cases is back propagation, to train the network. Back propagation is a learning algorithm for adjusting the weights. It starts with the weights between the output layer PE’s and the last hidden layer PE’s and works backwards through the network. 4. Once back propagation has finished, the forward process starts again, and this cycle is continued until the error between predicted and actual outputs is minimized. 3.1. The Back Propagation Algorithm: Back propagation, or propagation of error, is a common method of teaching artificial neural networks how to perform a given task. Back propagation is the method of training artificial neural networks so as to minimize the objective function. The back propagation algorithm performs learning on a feed-forward neural network. The back propagation algorithm is used in layered feed forward ANNs. This means that the artificial neurons are organized in layers, and send their signals “forward”, and then the errors are propagated backwards. The back propagation algorithm uses supervised learning, which means that we provide the algorithm with examples of the inputs and outputs we want the network to compute, and then the error (difference between actual and expected results) is calculated. The idea of the back propagation algorithm is to reduce this error, until the ANN learns the training data. Algorithm for a 3-layer network: 1. Initialize the weights in the network 2. Do a. For each example E in the training set oNeural-net-output (network, E); forward pass
  • 15. oT = teacher output for E oCalculate error (T - O) at the output units oCompute ΔWho for all weights from hidden layer to output layer; oBackward pass oCompute ΔWih for all weights from input layer to hidden layer; oBackward pass continued oUpdate the weights in the network 3. Until all examples classified correctly or stopping criterion satisfied 4. Return the network The Back Propagation learning algorithm can be divided into two phases: Phase 1: Propagation This phase involves the following steps: 1. Forward propagation of a training pattern's input through the neural network. 2. Backward propagation of the propagation's output activations through the neural network using the training pattern's target. Phase 2: Weight update For each weight-synapse the following steps are used: 1. Multiply its output delta and input activation to get the gradient of the weight. 2. Bring the weight in the opposite direction of the gradient by subtracting a ratio of it from the weight. Repeat phase 1 and 2 until the performance of the network is satisfactory. 3.2 Pseudo Code for one Layer A single neuron (i.e. processing element) takes in total input PEinput and produces output activation PEout. In this project, we are taking the activation function as Sigmoid Function. Hence we can consider the out PEout=Sigmoid(PEinput). Sigmoid function refers to the special case of the logistic function shown below and defined by the formula
  • 16. Figure 3.1 the sigmoid curve Though other activation functions are often used (e.g. linear or hyperbolic tangent). This has the effect of squashing the infinite range of PEinput into the range 0 to 1. It also has the convenient property that its derivative takes the particularly simple form dS dt = S ∗ (1 − S) Typically, the input PEinput into a given neuron will be the weighted sum of output activations feeding in from a number of other neurons. It is convenient to think of the activations flowing through layers of neurons. So, if there are NumUnitLayer1 neurons in layer 1, the total activation flowing into our layer 2 neuron is the sum over the product OutputLayer1[i]*Wt[i], where Wt[i] is the strength/weight of the connection between PE[i] in layer 1 and our PE in layer 2. Each neuron will also have a bias, or resting state, that is added to the sum of inputs, and it is convenient to call this Wt[0]. We can then write InputLayer2 = Wt[0] // consider the resting state bias weight // for( i = 1 | i < = NumUnitLayer1 | i++ ) // setting loop condition //
  • 17. { Add to InputLayer2 the sum over the product OutputLayer1[i] * Wt[i] } Compute the sigmoid OutputLayer2 = 1 1+e−Input Layer2 to get activation output Similarly layer 2 will have many processing elements as well, so it is appropriate to write the weights between PE[i] in layer 1 and PE[j] in layer 2 as a two dimensional array Wt[i][j]. Thus to get the output of PE[j] in layer 2 we have InputLayer2[j] = Wt[0][j] For ( i = 1 | i < = NumUnitLayer1 | i++ ) { Add to InputLayer2[j] the sum over the product OutputLayer1[i] * Wt[i][j] } Compute the sigmoid OutputLayer2[j] = 1 1+e−Input Layer2[j] to get activation output Now we know that Layer 2 has number of processing units given by NumUnitLayer2 and the above code calculates the output for only one processing element PE[j]. However we require the output for all the processing elements in Layer 2. Hence we introduce another loop to get all the layer 2 outputs For ( j = 1 | j < = NumUnitLayer2 | j++ ) { InputLayer2[j] = Wt[0][j] For ( i = 1 | i < = NumUnitLayer1 | i++ ) { Add to InputLayer2[j] the sum over the product OutputLayer1[i] * Wt[i][j] } Compute sigmoid OutputLayer2[j] = 1 1+e−Input Layer2[j] for output }
  • 18. 3.3 Pseudo Code for all Layers Now that we have calculated the output for all the processing elements in one layer, we can look at writing the code which calculates the output for all the layers in our network. Three layer networks are necessary and sufficient for most purposes, so our layer 2 outputs feed into a third layer in the same way as the above cases. The feed forward neural network chosen for this project has three layers 1, 2, 3 and here is the calculation of output for all three layers of the network For ( j = 1 | j < = NumUnitLayer2 | j++ ) // computes Layer 2 outputs // { InputLayer2[j] = WtLayer1/Layer2[0][j] For ( i = 1 | i < = NumUnitLayer1 | i++ ) { Add to InputLayer2[j] the sum over OutputLayer1[i] * WtLayer1/Layer2 [i][j] } Compute sigmoid OutputLayer2[j] = 1 1+e−Input Layer2[j] for output } For ( k = 1 | k < = NumUnitLayer3 | k++ ) // computes Layer 3 outputs // { InputLayer3[k] = WtLayer2/Layer3[0][k] For ( j = 1 | j < = NumUnitLayer2 | j++ ) { Add to InputLayer3[k] the sum over OutputLayer2[j] * WtLayer2/Layer3 [j][k] } Compute sigmoid OutputLayer3[k] = 1 1+e−Input Layer3[k] for output } To avoid confusion in the pseudo code there is a different index for each layer: i, j, k for Layers 1, 2, 3 respectively. Weights for connections are also different for distinguishing between the different layers, WtLayer1/Layer2 and WtLayer2/Layer3. For obvious reasons, for three layer networks, it is traditional to call layer 1 the Input layer, layer 2 the Hidden layer, and layer 3 the Output layer. The neural network in this project has a design similar to the figure shown below.
  • 19. Figure 3.2 the design for calculating output activation Now we can denote the layers 1, 2, 3 as input layer, hidden layer, and output layer respectively. The weights for the connections have also been denoted appropriately. As shown in the above figure, the initial bias weights are also included in the input for each layer and consequently the output also. For ( j = 1 | j < = NumUnitHidden | j++ ) // computes Hidden Layer PE outputs // { InputHidden[j] = WtInput/Hidden[0][j] For ( i = 1 | i < = NumUnitInput | i++ ) { Add to InputHidden[j] the sum over OutputInput[i] * WtInput/Hidden [i][j] } Compute sigmoid OutputHidden[j] = 1 1+e−Input Hidden [j] for output } For ( k = 1 | k < = NumUnitOuput | k++ ) // computes Output Layer PE outputs // { InputOutput[k] = WtHidden/Output[0][k] For ( j = 1 | j < = NumUnitHidden | j++ ) { Add to InputOutput [k] sum over OutputHidden[j] * WtHidden/Output [j][k] } Compute sigmoid Output[k] = 1 1+e−Input Output[k] for output }
  • 20. 4 Implementation 4.1 Pseudo Code for training patterns In this project, there will be a whole set of training patterns(NumExamples), i.e. pairs of input and target output vectors, Input[E][i] , Target[E][k] labelled by the index E. The network learns by minimizing some measure of the error of the network's actual outputs compared with the target outputs. The sum squared error for all the output units, denoted by k and all training patterns, denoted by E will be given by Error = 0.0 ; For ( E= 1 | E < = NumUnitHidden | E++ ) { For ( k = 1 | k < = NumUnitOuput | k++ ) { Add to Error the sum over the product 0.5 * (Target[E][k] - Output[E][k]) * (Target[E][k] - Output[E][k]) ; } } The factor of 0.5 is conventionally included to simplify the algebra in deriving the learning algorithm. If we insert the above code for computing the network outputs into the E loop of this, we end up with Error = 0.0 ; For ( E= 1 | E < = NumUnitHidden | E++ ) { // computes for all training patterns(E) // For ( j = 1 | j < = NumUnitHidden | j++ ) { InputHidden[E][j] = WtInput/Hidden[0][j] For ( i = 1 | i < = NumUnitInput | i++ ) {
  • 21. Add to InputHidden[E] [j] the sum over OutputInput[E] [i] * WtInput/Hidden [i][j] } Compute sigmoid OutputHidden[E][j] = 1 1+e−Input Hidden[E][j] for output } For ( k = 1 | k < = NumUnitOuput | k++ ) { InputOutput[E] [k] = WtHidden/Output[0][k] For ( j = 1 | j < = NumUnitHidden | j++ ) { Add to InputOutput [E] [k] sum over OutputHidden[E] [j] * WtHidden/Output [j][k] } Compute sigmoid Output[E][k] = 1 1+e−Input Output[E][k] for output Add to Error the sum over the product 0.5 * (Target[E][k] - Output[E][k]) * (Target[E][k] - Output[E][k]) } } 4.2 Pseudo Code for minimizing error The next stage of the project involves iteratively adjusting the weights to minimize the network's error. The method adopted in this project is by 'gradient descent' on the error function. We can compute how much the error is changed by a small change in each weight (i.e. compute the partial derivatives dError/dWt) and shift the weights by a small amount in the direction that reduces the error. As stated before, we use the back-propagation algorithm. After the calculation of the above sum squared error, we can compute and apply one iteration (or 'epoch') of the required weight changes ΔWho and ΔWih using Error = 0.0 ; For ( E= 1 | E < = NumUnitHidden | E++ ) { // computes for all training patterns(E) // For ( j = 1 | j < = NumUnitHidden | j++ ) {
  • 22. InputHidden[E][j] = WtInput/Hidden[0][j] For ( i = 1 | i < = NumUnitInput | i++ ) { Add to InputHidden[E] [j] the sum over OutputInput[E] [i] * WtInput/Hidden [i][j] } Compute sigmoid OutputHidden[E][j] = 1 1+e−Input Hidden[E][j] for output } For ( k = 1 | k < = NumUnitOuput | k++ ) { InputOutput[E] [k] = WtHidden/Output[0][k] For ( j = 1 | j < = NumUnitHidden | j++ ) { Add to InputOutput [E] [k] sum over OutputHidden[E] [j] * WtHidden/Output [j][k] } Compute sigmoid Output[E][k] = 1 1+e−Input Output[E][k] for output Add to Error the sum over the product 0.5 * (Target[E][k] - Output[E][k]) * (Target[E][k] - Output[E][k]) ; ΔOutput[k] = (Target[E][k] - Output[E][k]) * Output[E][k] * (1 - Output[E][k]) // derivative of the function // } For ( j = 1 | j < = NumUnitHidden | j++ ) { // Back Propagation of error to hidden layer // Sum of ΔOutput [j] = 0.0 For ( k = 1 | k < = NumUnitOuput | k++ ) { Add to Sum of ΔOutput [j] the sum over the product WtHidden/Output [j][k] * ΔOutput [k] ; } ΔH[j] = Sum of ΔOutput [j] * OutputHidden [E][j] * (1.0 - OutputHidden [E][j]) // derivative of the function //
  • 23. } For ( j = 1 | j < = NumUnitHidden | j++ ) { // This loop updates the weight input to hidden // Add to ΔWih [0][j] the sum of: product β * ΔH [j] to the product: α * ΔWih [0][j] Add to WtInput/Hidden [0][j] the change ΔWih [0][j] For ( i = 1 | i < = NumUnitInput | i++ ) { Add to ΔWih [i][j] the sum of product β * InputHidden [p][i] * ΔH [j] to the product: α * ΔWih [i][j] Add to WtInput/Hidden [i][j] the change ΔWih [i][j] } } For ( k = 1 | k < = NumUnitOuput | k++ ) { // This loop updates the weight hidden to output // Add to ΔWho [0][k] the sum of: product β * ΔOutput[k] to the product: α * ΔWho [0][k] Add to WtHidden/Output [0][k] the change ΔWho [0][k] For ( j = 1 | j < = NumUnitHidden | j++ ) { Add to ΔWho [j][k] the sum of product β * OutputHidden [p][j] * ΔOutput [k] to the product: α *ΔWho [j][k] Add to WtHidden/Output [j][k] the change ΔWho [j][k] } } } The weight changes ΔWih and ΔWho are each made up of two components. First, the beta component that is the gradient descent contribution. Second, the alpha component is a 'momentum' term which effectively keeps a moving average of the gradient descent weight change contributions, and thus smoothes out the overall weight changes. The complete training process will consist of repeating the above weight updates for a number of epochs until some error criterion is met.
  • 24. 5 Results Figure 5.1 Output Screenshot The program based on the design discussed in the previous section was executed successfully. The pseudo code was successfully implemented and the three layered feed forward neural network was simulated on the basis of the back propagation algorithm.
  • 25. 6 References [1] Pinkus, A. (1999), "Approximation theory of the MLP model in neural networks," Acta Numerica, 8, 143-196. [2] Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, NY: Macmillan. [3] Nigrin, A. (1993), Neural Networks for Pattern Recognition, Cambridge, MA: The MIT Press. [4] Zurada, J.M. (1992), Introduction To Artificial Neural Systems, Boston: PWS Publishing Company. [5] Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press. [6] Cichocki, A. and Unbehauen, R. (1993). Neural Networks for Optimization and Signal Processing. NY: John Wiley & Sons, ISBN 0-471-93010-5. [7] Diamantaras, K.I., and Kung, S.Y. (1996) Principal Component Neural Networks: Theory and Applications, NY: Wiley. [8] Fausett, L. (1994), Fundamentals of Neural Networks, Englewood Cliffs, NJ: Prentice Hall. [9] Kosko, B.(1992), Neural Networks and Fuzzy Systems, Englewood Cliffs, N.J.: Prentice-Hall. [10] Masters, T. (1993). Practical Neural Network Recipes in C++, San Diego: Academic Press. [11] Masters, T. (1995) Advanced Algorithms for Neural Networks: A C++ Sourcebook, NY: John Wiley and Sons, ISBN 0-471-10588-0 [12] Oja, E. (1989), "Neural networks, principal components, and subspaces," International Journal of Neural Systems, 1, 61-68. [13] Pao, Y. H. (1989), Adaptive Pattern Recognition and Neural Networks, Reading, MA: Addison-Wesley Publishing Company, ISBN 0-201-12584-6. [14] Reed, R.D., and Marks, R.J, II (1999), Neural Smithing: Supervised Learning in Feed forward Artificial Neural Networks, Cambridge, MA: The MIT Press, ISBN 0-262- 18190-8. [15] Sanger, T.D. (1989), "Optimal unsupervised learning in a single-layer linear Feed forward neural network," Neural Networks, 2, 459-473.