Introduction Of Artificial neural network

Madras universityDepartment Of Computer Science

Seminar OnIntroduction Of ANN, RulesAndAdaptive Resonance Theory

GROUP MEMBERS ARE :P.JayaVelJ.Joseph Amal RajM.Kaja Mohinden

ARTIFICIAL NEURAL NETWORK (ANN)An artificial neural network (ANN), usually called "neural network" (NN), is a mathematical model or computational model that tries to simulate the structure and/or functional aspects of biological neural networks.It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation.In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase.

ARTIFICIAL NEURAL NETWORK (ANN)

ARTIFICIAL NEURAL NETWORK (ANN)Why use neural networks?Neural networks, with their remarkable ability to derive meaning from complicated or imprecise data, can be used to extract patterns and detect trends that are too complex to be noticed by either humans or other computer techniques. A trained neural network can be thought of as an "expert" in the category of information it has been given to analyse. This expert can then be used to provide projections given new situations of interest and answer "what if" questions.Other advantages include: Adaptive learning: An ability to learn how to do tasks based on the data given for training or initial experience.

Self-Organisation: An ANN can create its own organisation or representation of the information it receives during learning time.

Real Time Operation: ANN computations may be carried out in parallel, and special hardware devices are being designed and manufactured which take advantage of this capability.

Fault Tolerance via Redundant Information Coding: Partial destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage. ARTIFICIAL NEURAL NETWORK (ANN)Learning paradigms :There are three major learning paradigmsSupervised learning

Reinforcement learningsupervised learningIn supervised learning, the learning rule is provided with a set of examples (the training set ) of proper network behavior: , where is an input to the network and is the corresponding correct ( target ) output. As the inputs are applied to the network, the network outputs are compared to the targets. The learning rule is then used to adjust the weights and biases of the network in order to move the network outputs closer to the targets. The perceptron learning rule falls in this supervised learning category. In supervised learning, we are given a set of example pairs and the aim is to find a function in the allowed class of functions that matches the examples. In other words, we wish to infer the mapping implied by the data; the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain.

supervised learningFeedforward Learning RulesQuickprop[fahlman88empirical]Equation 1. Error derivative at previous epoch

Equation 2. Error derivative at this epoch The Quickprop algorithm is loosely based on Newton's method. It is quicker than standard backpropagation because it uses an approximation to the error curve, and second order derivative information which allow a quicker evaluation. Training is similar to backprop except for a copy of (eq. 1) the error derivative at a previous epoch. This, and the current error derivative (eq. 2), are used to minimise an approximation to this error curve.

supervised learningThe update rule is given in equation 3:Equation 3. Quickprop update rule This equation uses no learning rate. If the slope of the error curve is less than that of the previous one, then the weight will change in the same direction (positive or negative). However, there needs to be some controls to prevent the weights from growing too large.

Unsupervised learningIn unsupervised learning we are given some data x and the cost function to be minimized, that can be any function of the data x and the network's output, f.The cost function is dependent on the task (what we are trying to model) and our a priori assumptions (the implicit properties of our model, its parameters and the observed variables).

Unsupervised learning As a trivial example, consider the model f(x) = a, where a is a constant and the cost C = E[(x − f(x))2]. Minimizing this cost will give us a value of a that is equal to the mean of the data. The cost function can be much more complicated. Its form depends on the application: for example, in compression it could be related to the mutual information between x and y, whereas in statistical modelling, it could be related to theposterior probability of the model given the data. (Note that in both of those examples those quantities would be maximized rather than minimized).Tasks that fall within the paradigm of unsupervised learning are in general estimation problems; the applications include clustering, the estimation of statistical distributions, compression and filtering.

Unsupervised learningUnsupervised learning, in contrast to supervised learning, does not provide the network with target output values. This isn't strictly true, as often (and for the cases discussed in the this section) the output is identical to the input. Unsupervised learning usually performs a mapping from input to output space, data compression or clustering.

Reinforcement learningIn reinforcement learning, data x are usually not given, but generated by an agent's interactions with the environment. At each point in time t, the agent performs an action yt and the environment generates an observation xt and an instantaneous cost ct, according to some (usually unknown) dynamics.Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks.

Reinforcement learningThe aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost; i.e., the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated.ANNs are frequently used in reinforcement learning as part of the overall algorithm.

Neural Network “Learning Rules”:Successful learning in any neural network is dependent on how the connections between the neurons are allowed to change in response to activity. The manner of change is what the majority of researchers call "a learning rule". However, we will call it a "synaptic modification rule" because although the network learned the sequence, it is not clear that the *connections* between the neurons in the network "learned" anything in particular.

Mathematical synaptic Modification ruleThere are many categories of mathematical synaptic modification rule which are used to describe how synaptic strengths should be changed in a neural network. Some of these categories include: backpropgration of error, correlative Hebbian, and temporally-asymmetric Hebbian.

Mathematical synaptic modification ruleBackpropogation of error states that connection strengths should change throughout the entire network in order to minimize the difference between the actual activity and the "desired" activity at the "output" layer of the network.

Mathematical synaptic Modification ruleCorrelative Hebbian states that any two interconnected neurons that are active at the same time should strengthen their connections, so that if one of the neurons is activated again in the future the other is more likely to become activated too.

Mathematical synaptic Modification ruleTemporally-asymmetric Hebbian is described in more detail in the example below, but essentially emphasizes the importants of causality: if a neuron realiably fires before another, its connection to the other neuron should be strengthened. Otherwise, it should be weakened.

Neural Network “Learning Rules”:The Delta RuleThe Pattern AssociatorThe Hebb Rule

The Delta RuleA generalized form of the delta rule, developed by D.E. Rumelhart, G.E. Hinton, and R.J. Williams, is needed for networks with hidden layers. They showed that this method works for the class of semilinear activation functions (non-decreasing and differentiable).Generalizing the ideas of the delta rule, consider a hierarchical network with an input layer, an output layer and a number of hidden layers.

The Delta Rule. We will consider only the case where there is one hidden layer. The network is presented with input signals which produce output signals that act as input to the middle layer. Output signals from the middle layer in turn act as input to the output layer to produce the final output vector. This vector is compared to the desired output vector. Since both the output and the desired output vectors are known, the delta rule can be used to adjust the weights in the output layer.

The Delta RuleCan the delta rule be applied to the middle layer? Both the input signal to each unit of the middle layer and the output signal are known. What is not known is the error generated from the output of the middle layer since we do not know the desired output. To get this error, backpropagate through the middle layer to the units that are responsible for generating that output. The error genrated from the middle layer could be used with the delta rule to adjust the weights.

The Pattern AssociatorA pattern associator learns associations between input patterns and output patterns. One of the most appealing characteristics of such a network is the fact that it can generate what it learns about one pattern to other similar input patterns. Pattern associators have been widely used in distributed memory modeling.

The Pattern AssociatorThe pattern associator is one of the more basic two-layer networks. Its architecture consists of two sets of units, the input units and the output units.Each input unit connects to each output unit via weighted connections.Connections are only allowed from input units to output units.

The Pattern AssociatorThe effect of a unit ui in the input layer on a unit uj in the output layer is determined by the product of the activation ai of ui and the weight of the connection from ui to uj. The activation of a unit uj in the output layer is given by: SUM(wij * ai).

Adaptive Resonance Theory (ART) Discrete Bidirectional Associative Memory Kochen Self Organization MapCounter Propagation Network (CPN) PerceptronVector RepresentationADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) Madaline (Multiple Adaline) Backpropagation, or propagation of error

Adaptive Resonance Theory (ART) Adaptive Resonance Theory (ART) is a theory developed by Stephen Grossberg and Gail Carpenter on aspects of how the brain processes information. It describes a number of neural network models which use supervised and unsupervised learning methods, and address problems such as pattern recognition and prediction.

Discrete Bidirectional Associative Memory

Kochen Self Organization MapThe self-organizing map (SOM) invented by TeuvoKohonen performs a form of unsupervised learning. A set of artificial neurons learn to map points in an input space to coordinates in an output space. The input space can have different dimensions and topology from the output space, and the SOM will attempt to preserve these.

Kochen Self Organization MapIf an input space is to be processed by a neural network, the first issue of importance is the structure of this space. A neural network with real inputs computes a function f defined from an input space A to an output space B. The region where f is defined can be covered by a Kohonen network in such a way that when, for example,an input vector is selected from the region a1, only one unit in the network fires. Such a tiling in which input space is classified in subregions is also called a chart or map of input space. Kohonen networks learn to create maps of the input space in a self-organizing way.

Kochen Self Organization Map-AdvantagesProbably the best thing about SOMs that they are very easy to understand. It’s very simple, if they are close together and there is grey connecting them, then they are similar. If there is a black ravine between them, then they are different. Unlike Multidimensional Scaling or N-land, people can quickly pick up on how to use them in an effective manner.Another great thing is that they work very well. As I have shown you they classify data well and then are easily evaluate for their own quality so you can actually calculated how good a map is and how strong the similarities between objects are.

PerceptronThe perceptron is a type of artificial neural network invented in 1957 at the Cornell Aeronautical Laboratory by Frank Rosenblatt. It can be seen as the simplest kind of feedforward neural network: a linear classifier.The Perceptron is a binary classifier that maps its input x (a real-valued vector) to an output value f(x) (a single binary value) across the matrix.where w is a vector of real-valued weights and is the dot product (which computes a weighted sum). b is the 'bias', a constant term that does not depend on any input value.

ADALINEDefinitionAdaline is a single layer neural network with multiple nodes where each node accepts multiple inputs and generates one output. Given the following variables:x is the input vectorw is the weight vectorn is the number of inputsθ some constanty is the outputthen we find that the output is . If we further assume thatxn + 1 = 1wn + 1 = θ then the output reduces to the dot product of x and w

Madaline Madaline (Multiple Adaline) is a two layer neural network with a set of ADALINEs in parallel as its input layer and a single PE (processing element) in its output layer. For problems with multiple input variables and one output, each input is applied to one Adaline. For similar problems with multiple outputs, madalines in parallel can be used. The madaline network is useful for problems which involve prediction based on multiple inputs, such as weather forecasting (Input variables: barometric pressure, difference in pressure. Output variables: rain, cloudy, sunny).

BackpropagationBackpropagation, or propagation of error, is a common method of teaching artificial neural networks how to perform a given task. It was first described by Arthur E. Bryson and Yu-Chi Ho in 1969,[1][2] but it wasn't until 1986, through the work of David E. Rumelhart, Geoffrey E. Hinton and Ronald J. Williams, that it gained recognition, and it led to a “renaissance” in the field of artificial neural network research.It is a supervised learning method, and is an implementation of the Delta rule. It requires a teacher that knows, or can calculate, the desired output for any given input. It is most useful for feed-forward networks (networks that have no feedback, or simply, that have no connections that loop). The term is an abbreviation for "backwards propagation of errors". Backpropagation requires that the activation function used by theartificial neurons (or "nodes") is differentiable.

BackpropagationBackpropagationCalculation of error dk = f(Dk) -f(Ok)

Network Structure –Back-propagation NetworkOi Output UnitWj,iajHidden UnitsWk,jIkInput Units

Counter propagation network (CPN) (§ 5.3)Basic idea of CPNPurpose: fast and coarse approximation of vector mappingnot to map any given x to its with given precision,input vectors x are divided into clusters/classes.each cluster of x has one output y, which is (hopefully) the average of for all x in that class.Architecture: Simple case: FORWARD ONLY CPN, yzx111yvzwxjj,kkk,iiyzxmpnfrom hidden (class) to outputfrom input to hidden (class)

Learning in two phases: training sample (x, d ) where is the desired precise mappingPhase1: weights coming into hidden nodes are trained by competitive learning to become the representative vector of a cluster of input vectors x: (use only x, the input part of(x, d )) 1. For a chosen x, feedforward to determined the winning 2. 3. Reduce , then repeat steps 1 and 2 until stop condition is metPhase 2: weights going out of hidden nodes are trained by delta rule to be an average output of where x is an input vector that causes to win (use both x andd). 1. For a chosen x, feedforward to determined the winning 2. (optional) 3. 4. Repeat steps 1 – 3 until stop condition is met

Adaptive Resonance TheoryAdaptive Resonance Theory (ART) was developed by Grossberg (1976)Input vectors which are close to each other according to a specific similarity measure should be mapped to the same clusterART adapts itself by storing input patterns, and tries to match best the input pattern 45

Adaptive Resonance Theory 1 (ART 1)ART 1 is a binary classification model. Various other versions of the model have evolved from ART 1Pointers to these can be found in the bibliographic remarksThe main network comprises the layers F1, F2 and the attentional gain control as the attentional subsystemThe attentional vigilance node forms the orienting subsystem

ART 1: Architecture……Attentional SubsystemOrienting SubsystemF2--++F1-+-GA+++I

ART 1: 2/3 RuleJ…F2Si(yj)vjisi-sGGF1+lliThree kinds of inputs to each F1 neuron decide when the neuron firesExternal input Ii

Top-down feedback through outstar weights vji

Gain control signal sGART 1: 2/3 RuleThe gain control signal sG = 1 if Iis presented and all neurons in F2are inactivesGis nonspecificWhen the input is initially presented to the system, sG= 1As soon as a node Jin F2 fires as a result of competition, sG = 0

Adaptive Resonance Theory (ART) ART1: for binary patterns; ART2: for continuous patterns

Motivations: Previous methods have the following problems:Number of class nodes is pre-determined and fixed. Under- and over- classification may result from training

Some nodes may have empty classes.

no control of the degree of similarity of inputs grouped in one class. Training is non-incremental: with a fixed set of samples,

adding new samples often requires re-train the network with the enlarged training set until a new stable state is reached.Ideas of ART model:suppose the input samples have been appropriately classified into k clusters (say by some fashion of competitive learning).each weight vector is a representative (average) of all samples in that cluster.when a new input vector x arrivesFind the winner j* among all k cluster nodesCompare with x if they are sufficiently similar (x resonates with class j*), then update based on else, find/create a free class node and make x as its first member.

To achieve these, we need:a mechanism for testing and determining (dis)similarity between x and .a control for finding/creating new class nodes.need to have all operations implemented by units of local computation.Only the basic ideas are presentedSimplified from the original ART modelSome of the control mechanisms realized by various specialized neurons are done by logic statements of the algorithm

Working of ART13 phases after each input vector x is appliedRecognition phase: determine the winner cluster for xUsing bottom-up weights bWinner j* with max yj* = bj*ּxx is tentatively classified to cluster j*the winner may be far away from x (e.g., |tj* - x| is unacceptably large)

Working of ART1 (3 phases)Comparison phase: Compute similarity using top-down weights t: vector:If (# of 1’s ins)|/(# of 1’s inx) > ρ, accept the classification, update bj* and tj*else: remove j* from further consideration, look for other potential winner or create a new node with x as its first patter.

Weight update/adaptive phaseInitial weight: (no bias) bottom up: top down:When a resonance occurs withIf k sample patterns are clustered to node jthen = pattern whose 1’s are common to all these k samples

Introduction Of Artificial neural network

Example for input x(1)Node 1 wins

NotesClassification as a search processNo two classes have the same b and tOutliers that do not belong to any cluster will be assigned separate nodesDifferent ordering of sample input presentations may result in different classification.Increase of r increases # of classes learned, and decreases the average class size.Classification may shift during search, will reach stability eventually.There are different versions of ART1 with minor variationsART2 is the same in spirit but different in details.

RG1G2ART1 Architecture++--++++

cluster units: competitive, receive input vector x through weights b: to determine winner j. input units: placeholder or external inputs interface units: pass s to x as input vector for classification by compare x and controlled by gain control unit G1Needs to sequence the three phases (by control units G1, G2, and R)

R = 0: resonance occurs, update andR = 1: fails similarity test, inhibits J from further computation

Fuzzy ARTLayer1 consists of neurons that are connected to the neurons in Layer 2 through weight vectors.Thenumber of neurons in Layer 1 depends on the characteristics of the input data.The Layer 2 represent clusters.

Fuzzy ART FMEAFMEA values are evaluated separately with severity, detection and occurrence valuesThe aim is to apply Fuzzy ART algorithm to FMEA method and by performing FMEA on test problems, most favorable parameter combinations (α , β and ρ) are investigated.

Hand-worked ExampleCluster the vectors 11100, 11000, 00001, 00011Low vigilance: 0.3High vigilance: 0.7

Hand-worked Example:  = 0.3

ART 1: Clustering Application = 0.3

Hand-worked Example:  = 0.7

ART 1: Clustering Application = 0.7

Neurophysiological Evidence for ARTMechanismsThe attentional subsystem of an ART network has been used to model aspects of the inferotemporal cortexOrienting subsystem has been used to model a part of the hippocampal system, which is known to contribute to memory functionsThe feedback prevalent in an ART network can help focus attention in models of visual object recognition

Other ApplicationsAircraft Part Design Classification System.See text for details.

Ehrenstein Pattern Explained by ART !The bright disc disappears when the alignment of the dark lines is disturbed!Generates a circular illusory contour – a circular disc of enhanced brightness

78Other Neurophysiological EvidenceAdam Sillito [University College, London]Cortical feedback in a cat tunes cells in its LGN to respond best to lines of a specific length. Chris Redie [MPI Entwicklungsbiologie, Germany]Found that some visual cells in a cat’s LGN and cortex respond best at line ends— more strongly to line ends than line sides. Sillito et al. [University College, London] Provide neurophysiological data suggesting that the cortico-geniculate feedback closely resembles the matching and resonance of an ART network. Cortical feedback has been found to change the output of specific LGN cells, increasing the gain of the input for feature linked events that are detected by the cortex.

Computational ExperimentAnon-binarydataset of FMEA isused to evaluate theperformance of theFuzzy ART neuralnetwork on differenttest problems79

80Computational ExperimentFor acomprehensiveanalysis of the effectsof parameters on theperformance of FuzzyART in FMEA case, anumber of levels ofparameters areconsidered.

81Computational ExperimentThe Fuzzy ART neural network method is applied to determine the most favorable parameter (α, β and ρ) combinations during application of FMEA on test problems

82ResultsFor any test problem 900 solutions are obtained. The β-ρ interactions for parameter combinations are considered where solutions are obtained. For each test problem, all the combinations are evaluated and frequency distribution of clusters are constituted

Introduction Of Artificial neural network

Related slideshows

More Related Content

Introduction Of Artificial neural network