SlideShare a Scribd company logo
MADRAS UNIVERSITY DEPARTMENT OF  COMPUTER SCIENCE
ADALINE AND MADALINE ARTIFICIAL NEURAL NETWORK
GROUP MEMBERS ARE : D.ASHA G.CHAMUNDESWARI R.DEEPA LAKSHMI
ADALINE
What is an ADALINE Network? Stands for  Ada ptive  Lin ear  E lement It is a simple perceptron-like system that accomplishes classification by modifying weights in such a way as to diminish the MSE(MEAN SQUARE ERROR)  at every iteration. This can be accomplished using gradient Adaptive linear element (Adaline). Used in neural nets for Adaptive filtering Pattern recognition
ADALINE - ARCHITECTURE
Using ADALINE Networks Initialize Assign random weights to all links Training Feed in known inputs in random sequence Simulate the network Compute  error  between the input and the output  ( Error Function * ) Adjust weights  ( Learning Function * ) Repeat until  total error  <  ε Thinking Simulate the network Network will respond to any input Does not guarantee a correct solution even for trained inputs Initialize Training Thinking
Adaline – Widrow-Hoff Learning The learning idea is as follows: D efi ne an  error function  that measure the performance of the   performa n ce in terms of the weights, input, output and desired output. Take the derivative of this function with respect to the weights, and   modify the weights accordingly such that the er r or is decreased. Also known as the  Least Mean Square (LMS) error algorithm, the Widrow-Hoff rule, the Delta rule.
The ADALINE The Widrow-Hoff rule (also known as Delta Rule) Minimizes the error between desired output  t  and the net input  y_in Minimizes the squared error for each pattern: Example: if  s =  1 and  t =  0.5, then the graph of  E  against  w 1,1   would be: Gradient decent w ij (new ) =  w ij (old) +   ( t i  – y_in i ) x j E =  ( t – y_in ) 2 E w 1,1 w 1,1 0 0.5 1 0.1 0.25 0.9 Y 1 X 1
The ADALINE learning algorithm Step 0 Initialize all weights and set learning rate w ij =  (small random values)    = 0.2  (for example) Step 1 While stopping condition is false Step 1.1 For each training pair  s:t : Step 1.1.1  Set activations on input units x j  = s j Step 1.1.2  Compute net input to output units y_in i  = b i  +     x j w ij Step 1.1.3  Update bias and weights b i (new) =  b i (old) +   ( t i  – y_in i ) w ij (new) =  w ij (old) +   ( t i  – y_in i ) x j
Least Square Minimization Find gradient of error over all examples.  Either calculate the minimum or move opposite to gradient. Widrow-Hoff(LMS):  Use instantaneous example as approximation to gradient. Advantages:  No memory; on-line; serves similar function as noise to avoid local problems. Adjust by  w (new) =  w (old) +     x  for each  x . Here   desired output –   wx ) NNs Adaline
LMS (Least Mean Square Alg.) Apply input to Adaline input 2. Find the square error of current input Errsq(k) = (d(k) -  W x(k))**2 3. Approximate Grad(ErrorSquare) by  differentiating Errsq approximating average Errsq by Errsq(k) obtain  -2Errsq(k)x(k) Update W:  W(new) = W(old) + 2  Errsq(k)X(k) Repeat steps 1 to 4. NNs Adaline
Mean Square Error Training Set: Input: Target: Notation: Mean Square Error: Supervised neural networks that use an  MSE  cost function can use formal statistical methods to determine the confidence of the trained model. The MSE on a validation set can be used as an estimate for variance.  This value can then be used to calculate the  confidence interval  of the output of the network, assuming a  normal distribution .
Error Analysis The mean square error for the ADALINE Network is a quadratic function:
Adaptive Filtering Tapped Delay Line Adaptive Filter An  adaptive filter  is a filter that  self-adjusts  its  transfer function  according to an optimizing algorithm. Because of the complexity of the optimizing algorithms, most adaptive filters are  digital filters  that perform  digital signal processing  and adapt their performance based on the input signal.
Adaptive filter F 1   registers the input pattern. Signals   S i  are  modulated through weighted connections. F 2   computes the pattern match between the input and the weights.  i  x i  w ij  = X . W j  = |X| |W j | cos(X, W j )
Adaptive filter elements The dot product computes the projection of one vector on another.   The term  |X||W j |  denotes the energy, whereas  cos(X,W j ) d enotes   the pattern.   If the both vectors are normalized   (|X| = |W j | = |X||W j | = 1) ,  then  X.W j  = cos(X,W j ) .   This indicates how well the weight vector of the   neuron matched with the input vector. The neuron with the largest activity at  F 2   has the weights that are   most close to the input.
Applications Adaline has better convergence properties than Perceptron Useful in noise correction Adaline in every modem. NNs Adaline
Example: Noise Cancellation
Noise Cancellation Adaptive Filter
LMS Response
Echo Cancellation ECHO - in long distance telephone lines Adaptive filter: deals with the  choppy issue , which mimics the leakage of the incoming voice for suppressing the choppy speech from the outgoing signals.
n  : incoming voice,  s  : outgoing voice : noise (leakage of the incoming voice) y  : the output of the filter mimics  ( s  not correlated with  y , ) <   >  =  -   =   minimize <  > = minimize
This  adaline device  input devices allow the computer to see, ''feel,''  or ''hear'' its instructions. This is the training phase.  Let's illustrate this by the way a doctor observes a multitude of symptoms, some precisely measured, such as temperature or blood pressure, and some more subtle, such as coloring, pain patterns, or demeanor. What the doctor does is almost subconsciously to attach a weight or asignificance to each of the symptoms, based on his gross experience with many diseases and many patients, and he combines these effects to arrive at a diagnosis . Adaline Device For Medical
EXAMPLE FOR  ADALINE
 
 
Comparison with Perceptron Both use updating rule changing with each input One fixes binary error; the other minimizes continuous error Adaline always converges; see what happens with XOR Both can REPRESENT Linearly separable functions  NNs Adaline
Summary Single layer nets have limited representation power (linear separability problem) Adaline – Widrow-Hoff Learning  D efi ne an  error function  that measure the performance of the   performa n ce in terms of the weights, input, output and desired output. The ADALINE learning algorithm Adaptive filter: deals with the choppy issue, which mimics the leakage of the incoming voice for suppressing the choppy speech from the outgoing signals. Adaline has better convergence properties than Perceptron Useful in noise correction Adaline in every modem.
MADALINE
Madaline : Many adaline Madaline: Multiple Adalines connected. A Madaline is a combination of many Adalines.  This also enables the network to solve non-separable problems Learning algorithms for Madalines have gone through three stages of development. All three algorithms adhere to the ''Minimum Disturbance'' principle proposed by Widrow (1962), instead of explicitly computing the gradient of the network error.  Nodes whose net input is sufficiently small are selected as candidates for weight changes. The possible result of changing the output of such a node is examined .
If the change results in a decrease in network error, then weights of connections leading into that node are changed using the LMS (or similar) algorithm; otherwise, these weights remain unchanged. The magnitude of the weight change may be large enough to force the node output to change, or may be small so that a multitude of such changes are needed to achieve network error reduction.  This process is repeated for all input patterns and for all nodes until network error becomes acceptably small.
Architecture
Madaline Rule I  (MRI) training algorithm.
Madaline Rule I  (MRI) training algorithm. The  Madaline Rule I  (MRI) training algorithm.  The goal is to make the smallest possible perturbation to the network, by modifying the weights on connections leading into some Adaline  (hidden node), so as to decrease network error on the current input sample. Note that the Adaline output must change in sign in order to have any effect on the network output.
Madaline Rule I  (MRI) training algorithm.
Madaline Rule I  (MRI) training algorithm. A Madaline with an output node that computes the OR logical function.
Madaline Rule II  (MRII) training algorithm.
Madaline Rule lI  (MRI) training algorithm. The  Madaline Rule 11  (MRII) training algorithm, which is considerably different from backpropagation.  The weights are initialized to small random values, and training patterns are repeatedly presented. The algorithm modifies the first hidden layer of Adalines (i.e., connection weights from input nodes to layer number 1), then the second hidden layer (weights from layer 1 to layer 2), and so on.
Madaline Rule II (MRII) Training algorithm – A trial–and–error procedure with a minimum disturbance principle (those nodes that can affect the output error while incurring the least  change in their weights should have precedence in the learning process)
Madaline Rule lI  (MRI) training algorithm.
Madaline Rule lI  (MRI) training algorithm. High-level structure of a Madaline 11 with two Adalines at the first level and one Adaline at the second level. The  Madaline  Il architecture, shown in figure 4.3, improves on the capabilities of Madaline I, by using Adalines with modifiable weights at the Output layer of the network,  instead of fixed logic devices.  Figure 4.3
Madaline Rule III  (MRIII) training algorithm.
Madaline Rule lIl  (MRI) training algorithm The MR III  training algorithm was developed by Andes et al. (1988) to train feedforward networks with sigmoid node functions. This algorithm, described in figure 4.5, also follows the minimum disturbance principle using trial adaptations of nodes, instead of assuming the derivative of the node function to be known. Unlike MR II, weights of all nodes are adapted in each iteration.  The MR III algorithm has been shown to be mathematically equivalent to backpropagation (Widrow and Lehr, 1990). However, each weight change involves considerably more computation than m backpropagation.  MRIII has been advocated for some hardware implementations where the sigmoid node function is inaccurately implemented, so that the mathematically derived gradient is inapplicable. In such cases, MRIII is more useful since it effectively computes an approximation to the gradient~ without assuming a specific sigmoid node function. Note that all nodes in each layer are perturbed in each iteration, unlike MRII.
Madaline Rule lIl  (MRI) training algorithm
Comparison of MR III with MR II . .
Comparison of MR III with MR II . .
MADALINE- XOR EXAMPLE XOR’  XOR 0  0 0  1 1  0 1  1 -1  -1 -1  1 1  -1 1  1 -1 1 1 -1 0 1 1 0
MADALINE- XOR EXAMPLE
A Madaline for Translation – Invariant Pattern Recognition
A Madaline for Translation – Invariant Pattern Recognition Difficulties for pattern recognition -- noise, incompletion, distortion, transformation, occlusion ○   Translation-invariant pattern r ecognition
Relationships among the weight matrices of Adalines Adalines possess the identical set of weight values, which have been trained to detect a particular pattern
Extension -- Mutiple slabs with different key weight matrices for discriminating more then two classes of patterns
APPLICATION OF MADALINE Vehicle inductive signatures recognition using a Madaline neural network The degree of difficulty to accomplish a classification task is primarily determined by the class overlap in the input space [1, 2]. The difficulty is even greater if, in addition to the overlap, there is also class unbalance and the number of available patterns is reduced. Consider, for instance, a classification problem in which the input patterns are the inductive signatures of two classes of vehicles as shown in Fig. 1. These signals are collected by inductive loop traffic sensors [3] and the morphology of the curves in Fig. 1 is derived from the impedance alteration of the magnetic loop when the vehicle passes over [4]. It is hypothesized that the proximity of the metal parts of the axles alters the impedance of the loops and thus the presence of the axles is signalized. This way, the vehicle can be classified by the number of axles. Inductive signatures are used in traffic surveillance and management systems to recognize the class of a vehicle to estimate its speed and even to identify individual vehicle among other expected results [5–9]. These information are used to build a statistical database that may help traffic surveillance and management systems in decision-making. The class of a vehicle is one of the most important information  and serves, for instance, for access control to areas where the circulation is restricted to certain types of vehicles and for the collection of different values in tollgates.
 
Other applications : Net talk  Stack price predictions Forecast the weather Read electrocardiograms Type out simple sentences that are spoken to it Drive a car Fly a plane perform
SUMMARY  Madaline: Multiple Adalines connected This also enables the network to  solve non-separable problems Madaline Rule I, Madaline Rule  II, Madaline Rule lIl  (MRI) training algorithms A Madaline for Translation – Invariant Pattern Recognition Relationships among the weight matrices of Adalines Application :Vehicle inductive signatures recognition using a Madaline neural network
THANK YOU

More Related Content

Adaline madaline

  • 1. MADRAS UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE
  • 2. ADALINE AND MADALINE ARTIFICIAL NEURAL NETWORK
  • 3. GROUP MEMBERS ARE : D.ASHA G.CHAMUNDESWARI R.DEEPA LAKSHMI
  • 5. What is an ADALINE Network? Stands for Ada ptive Lin ear E lement It is a simple perceptron-like system that accomplishes classification by modifying weights in such a way as to diminish the MSE(MEAN SQUARE ERROR) at every iteration. This can be accomplished using gradient Adaptive linear element (Adaline). Used in neural nets for Adaptive filtering Pattern recognition
  • 7. Using ADALINE Networks Initialize Assign random weights to all links Training Feed in known inputs in random sequence Simulate the network Compute error between the input and the output ( Error Function * ) Adjust weights ( Learning Function * ) Repeat until total error < ε Thinking Simulate the network Network will respond to any input Does not guarantee a correct solution even for trained inputs Initialize Training Thinking
  • 8. Adaline – Widrow-Hoff Learning The learning idea is as follows: D efi ne an error function that measure the performance of the performa n ce in terms of the weights, input, output and desired output. Take the derivative of this function with respect to the weights, and modify the weights accordingly such that the er r or is decreased. Also known as the Least Mean Square (LMS) error algorithm, the Widrow-Hoff rule, the Delta rule.
  • 9. The ADALINE The Widrow-Hoff rule (also known as Delta Rule) Minimizes the error between desired output t and the net input y_in Minimizes the squared error for each pattern: Example: if s = 1 and t = 0.5, then the graph of E against w 1,1 would be: Gradient decent w ij (new ) = w ij (old) +  ( t i – y_in i ) x j E = ( t – y_in ) 2 E w 1,1 w 1,1 0 0.5 1 0.1 0.25 0.9 Y 1 X 1
  • 10. The ADALINE learning algorithm Step 0 Initialize all weights and set learning rate w ij = (small random values)  = 0.2 (for example) Step 1 While stopping condition is false Step 1.1 For each training pair s:t : Step 1.1.1 Set activations on input units x j = s j Step 1.1.2 Compute net input to output units y_in i = b i +  x j w ij Step 1.1.3 Update bias and weights b i (new) = b i (old) +  ( t i – y_in i ) w ij (new) = w ij (old) +  ( t i – y_in i ) x j
  • 11. Least Square Minimization Find gradient of error over all examples. Either calculate the minimum or move opposite to gradient. Widrow-Hoff(LMS): Use instantaneous example as approximation to gradient. Advantages: No memory; on-line; serves similar function as noise to avoid local problems. Adjust by w (new) = w (old) +  x for each x . Here  desired output –  wx ) NNs Adaline
  • 12. LMS (Least Mean Square Alg.) Apply input to Adaline input 2. Find the square error of current input Errsq(k) = (d(k) - W x(k))**2 3. Approximate Grad(ErrorSquare) by differentiating Errsq approximating average Errsq by Errsq(k) obtain -2Errsq(k)x(k) Update W: W(new) = W(old) + 2  Errsq(k)X(k) Repeat steps 1 to 4. NNs Adaline
  • 13. Mean Square Error Training Set: Input: Target: Notation: Mean Square Error: Supervised neural networks that use an MSE cost function can use formal statistical methods to determine the confidence of the trained model. The MSE on a validation set can be used as an estimate for variance. This value can then be used to calculate the confidence interval of the output of the network, assuming a normal distribution .
  • 14. Error Analysis The mean square error for the ADALINE Network is a quadratic function:
  • 15. Adaptive Filtering Tapped Delay Line Adaptive Filter An adaptive filter is a filter that self-adjusts its transfer function according to an optimizing algorithm. Because of the complexity of the optimizing algorithms, most adaptive filters are digital filters that perform digital signal processing and adapt their performance based on the input signal.
  • 16. Adaptive filter F 1 registers the input pattern. Signals S i are modulated through weighted connections. F 2 computes the pattern match between the input and the weights.  i x i w ij = X . W j = |X| |W j | cos(X, W j )
  • 17. Adaptive filter elements The dot product computes the projection of one vector on another. The term |X||W j | denotes the energy, whereas cos(X,W j ) d enotes the pattern. If the both vectors are normalized (|X| = |W j | = |X||W j | = 1) , then X.W j = cos(X,W j ) . This indicates how well the weight vector of the neuron matched with the input vector. The neuron with the largest activity at F 2 has the weights that are most close to the input.
  • 18. Applications Adaline has better convergence properties than Perceptron Useful in noise correction Adaline in every modem. NNs Adaline
  • 22. Echo Cancellation ECHO - in long distance telephone lines Adaptive filter: deals with the choppy issue , which mimics the leakage of the incoming voice for suppressing the choppy speech from the outgoing signals.
  • 23. n : incoming voice, s : outgoing voice : noise (leakage of the incoming voice) y : the output of the filter mimics ( s not correlated with y , ) < > = - = minimize < > = minimize
  • 24. This adaline device input devices allow the computer to see, ''feel,'' or ''hear'' its instructions. This is the training phase. Let's illustrate this by the way a doctor observes a multitude of symptoms, some precisely measured, such as temperature or blood pressure, and some more subtle, such as coloring, pain patterns, or demeanor. What the doctor does is almost subconsciously to attach a weight or asignificance to each of the symptoms, based on his gross experience with many diseases and many patients, and he combines these effects to arrive at a diagnosis . Adaline Device For Medical
  • 25. EXAMPLE FOR ADALINE
  • 26.  
  • 27.  
  • 28. Comparison with Perceptron Both use updating rule changing with each input One fixes binary error; the other minimizes continuous error Adaline always converges; see what happens with XOR Both can REPRESENT Linearly separable functions NNs Adaline
  • 29. Summary Single layer nets have limited representation power (linear separability problem) Adaline – Widrow-Hoff Learning D efi ne an error function that measure the performance of the performa n ce in terms of the weights, input, output and desired output. The ADALINE learning algorithm Adaptive filter: deals with the choppy issue, which mimics the leakage of the incoming voice for suppressing the choppy speech from the outgoing signals. Adaline has better convergence properties than Perceptron Useful in noise correction Adaline in every modem.
  • 31. Madaline : Many adaline Madaline: Multiple Adalines connected. A Madaline is a combination of many Adalines. This also enables the network to solve non-separable problems Learning algorithms for Madalines have gone through three stages of development. All three algorithms adhere to the ''Minimum Disturbance'' principle proposed by Widrow (1962), instead of explicitly computing the gradient of the network error. Nodes whose net input is sufficiently small are selected as candidates for weight changes. The possible result of changing the output of such a node is examined .
  • 32. If the change results in a decrease in network error, then weights of connections leading into that node are changed using the LMS (or similar) algorithm; otherwise, these weights remain unchanged. The magnitude of the weight change may be large enough to force the node output to change, or may be small so that a multitude of such changes are needed to achieve network error reduction. This process is repeated for all input patterns and for all nodes until network error becomes acceptably small.
  • 34. Madaline Rule I (MRI) training algorithm.
  • 35. Madaline Rule I (MRI) training algorithm. The Madaline Rule I (MRI) training algorithm. The goal is to make the smallest possible perturbation to the network, by modifying the weights on connections leading into some Adaline (hidden node), so as to decrease network error on the current input sample. Note that the Adaline output must change in sign in order to have any effect on the network output.
  • 36. Madaline Rule I (MRI) training algorithm.
  • 37. Madaline Rule I (MRI) training algorithm. A Madaline with an output node that computes the OR logical function.
  • 38. Madaline Rule II (MRII) training algorithm.
  • 39. Madaline Rule lI (MRI) training algorithm. The Madaline Rule 11 (MRII) training algorithm, which is considerably different from backpropagation. The weights are initialized to small random values, and training patterns are repeatedly presented. The algorithm modifies the first hidden layer of Adalines (i.e., connection weights from input nodes to layer number 1), then the second hidden layer (weights from layer 1 to layer 2), and so on.
  • 40. Madaline Rule II (MRII) Training algorithm – A trial–and–error procedure with a minimum disturbance principle (those nodes that can affect the output error while incurring the least change in their weights should have precedence in the learning process)
  • 41. Madaline Rule lI (MRI) training algorithm.
  • 42. Madaline Rule lI (MRI) training algorithm. High-level structure of a Madaline 11 with two Adalines at the first level and one Adaline at the second level. The Madaline Il architecture, shown in figure 4.3, improves on the capabilities of Madaline I, by using Adalines with modifiable weights at the Output layer of the network, instead of fixed logic devices. Figure 4.3
  • 43. Madaline Rule III (MRIII) training algorithm.
  • 44. Madaline Rule lIl (MRI) training algorithm The MR III training algorithm was developed by Andes et al. (1988) to train feedforward networks with sigmoid node functions. This algorithm, described in figure 4.5, also follows the minimum disturbance principle using trial adaptations of nodes, instead of assuming the derivative of the node function to be known. Unlike MR II, weights of all nodes are adapted in each iteration. The MR III algorithm has been shown to be mathematically equivalent to backpropagation (Widrow and Lehr, 1990). However, each weight change involves considerably more computation than m backpropagation. MRIII has been advocated for some hardware implementations where the sigmoid node function is inaccurately implemented, so that the mathematically derived gradient is inapplicable. In such cases, MRIII is more useful since it effectively computes an approximation to the gradient~ without assuming a specific sigmoid node function. Note that all nodes in each layer are perturbed in each iteration, unlike MRII.
  • 45. Madaline Rule lIl (MRI) training algorithm
  • 46. Comparison of MR III with MR II . .
  • 47. Comparison of MR III with MR II . .
  • 48. MADALINE- XOR EXAMPLE XOR’ XOR 0 0 0 1 1 0 1 1 -1 -1 -1 1 1 -1 1 1 -1 1 1 -1 0 1 1 0
  • 50. A Madaline for Translation – Invariant Pattern Recognition
  • 51. A Madaline for Translation – Invariant Pattern Recognition Difficulties for pattern recognition -- noise, incompletion, distortion, transformation, occlusion ○ Translation-invariant pattern r ecognition
  • 52. Relationships among the weight matrices of Adalines Adalines possess the identical set of weight values, which have been trained to detect a particular pattern
  • 53. Extension -- Mutiple slabs with different key weight matrices for discriminating more then two classes of patterns
  • 54. APPLICATION OF MADALINE Vehicle inductive signatures recognition using a Madaline neural network The degree of difficulty to accomplish a classification task is primarily determined by the class overlap in the input space [1, 2]. The difficulty is even greater if, in addition to the overlap, there is also class unbalance and the number of available patterns is reduced. Consider, for instance, a classification problem in which the input patterns are the inductive signatures of two classes of vehicles as shown in Fig. 1. These signals are collected by inductive loop traffic sensors [3] and the morphology of the curves in Fig. 1 is derived from the impedance alteration of the magnetic loop when the vehicle passes over [4]. It is hypothesized that the proximity of the metal parts of the axles alters the impedance of the loops and thus the presence of the axles is signalized. This way, the vehicle can be classified by the number of axles. Inductive signatures are used in traffic surveillance and management systems to recognize the class of a vehicle to estimate its speed and even to identify individual vehicle among other expected results [5–9]. These information are used to build a statistical database that may help traffic surveillance and management systems in decision-making. The class of a vehicle is one of the most important information and serves, for instance, for access control to areas where the circulation is restricted to certain types of vehicles and for the collection of different values in tollgates.
  • 55.  
  • 56. Other applications : Net talk Stack price predictions Forecast the weather Read electrocardiograms Type out simple sentences that are spoken to it Drive a car Fly a plane perform
  • 57. SUMMARY Madaline: Multiple Adalines connected This also enables the network to solve non-separable problems Madaline Rule I, Madaline Rule II, Madaline Rule lIl (MRI) training algorithms A Madaline for Translation – Invariant Pattern Recognition Relationships among the weight matrices of Adalines Application :Vehicle inductive signatures recognition using a Madaline neural network

Editor's Notes

  1. Next, state the action step. Make your action step specific, clear and brief. Be sure you can visualize your audience taking the action. If you can’t, they can’t either. Be confident when you state the action step, and you will be more likely to motivate the audience to action.