SlideShare a Scribd company logo
 
ARTIFICIAL NEURAL NETWORKS
PERCEPTRON
Perceptron A single artificial neuron that computes its weighted input and uses a threshold activation function. It is also called a TLU (Threshold Logic Unit) It   effectively  separates the input space into two categories  by  the hyperplane : w T x  +  b i  = 0
History of Artificial Neural Networks  McCulloch and Pitts (1943):  first neural network model Hebb (1949):  proposed a mechanism for learning, as increasing the synaptic weight between two neurons, by repeated activation of one neuron by the other across that synapse (lacked the inhibitory connection) Rosenblatt (1958):  Perceptron network and the associated learning rule Widrow & Hoff (1960):  a new learning algorithm for linear neural networks (ADALINE) Minsky and Papert (1969):  widely influential book about the limitations of single-layer perceptrons, causing the research on NNs mostly to come to an end. Some that still went on: Anderson, Kohonen (1972):  Use of ANNs as associative memory  Grossberg (1980):  Adaptive Resonance Theory Hopfield (1982):  Hopfield Network Kohonen (1982):  Self-organizing maps Rumelhart and McClelland (1982):  Backpropagation algorithm for training multilayer feed-forward networks . Started a resurgence on NN research again.
Error-correcting Learning. Associative Learning.
Types of  Learnin g •   Supervised Learning Network is provided with a set of examples of proper network behavior (inputs/targets) •   Reinforcement Learning Network is only provided with a grade, or score, which indicates network performance •   Unsupervised Learning Only network inputs are available to the learning algorithm. Network learns to categorize (cluster) the inputs.
1. Perceotron  2. Delta Rule 3. Error – Backprobagation Error-correcting Learning.
Decision Boundary • All points on the decision boundary have the same  inner product   (= -b)  with the weight vector • Therefore they have the same  projection  onto the weight vector ;   so  they must lie on a line orthogonal to the weight vector w T .p  = ||w||||p||Cos  proj. of p onto w   = ||p||Cos    =  w T .p /||w||  p w proj. of p onto w
Two layers Binary nodes that takes values 0  or1 Continuous weights , initially   Chosen randomly
 
Input Layer   —   A vector of predictor variable values ( x1...xp ) is presented to the input layer. The input layer (or processing before the input layer) standardizes these values so that the range of each variable is -1 to 1. The input layer distributes the values to each of the neurons in the hidden layer. In addition to the predictor variables, there is a constant input of 1.0, called the  bias  that is fed to each of the hidden layers; the bias is multiplied by a weight and added to the sum going into the neuron.
Hidden Layer   —  Arriving at a neuron in the hidden layer, the value from each input neuron is multiplied by a weight ( wji ), and the resulting weighted values are added together producing a combined value  uj . The weighted sum ( uj ) is fed into a transfer function, σ, which outputs a value  hj . The outputs from the hidden layer are distributed to the output layer.
Output Layer   Arriving at a neuron in the output layer, the value from each hidden layer neuron is multiplied by a weight ( wkj ), and the resulting weighted values are added together producing a combined value  vj . The weighted sum ( vj ) is fed into a transfer function, σ, which outputs a value  yk .
 
Learning  Problem To Be Solved How could we adjust the weights, so that this situation is remedied And the spontaneous output  matches our target output pattern (0)? We have a net input of –0.1,Which Gives an output pattern of (0) We have a single input pattern(1) Suppose we have an input pattern(0,1)
  Answer So we will leave it alone   Observation:  Weights from input  node with activation 0 does not have any effect on the net input   E.g.,add 0.2 to all weights     Increase the weights,so that the  net input exceeds 0.0
Perceptron algorithm in words  For each node in the output layer:  Calculate the error,which can onlytakestheValues1and1  If the error is0,the goal has been  achieved.Otherwise,we adjust the weights Do not alter weights from inactivated input Nodes    Decrease the weight if the error was 1,increase It if the error was-1
Perceptron algorithm in rules Weight  change = some small constant * (target activation-spontaneous output Activation) * input activation If speak of error instead of the “Target activation of minus the spontaneous output activation”,we have  Weight change = Some small constant * error  *  input activation
 
Perceptro Learning Rule ( Summary ) How do we find the weights using a learning procedure? 1 - Choose initial weights randomly 2 - Present a randomly chosen pattern x 3 - Update weights using Delta rule: w ij  (t+1) = w ij  (t) + err i  * x j where err i  = (target i  -  o utput i ) 4 - Repeat steps 2 and 3 until the stopping criterion (convergence, max number of iterations) is reached
Perceptron Convergence  theorem If a pattern set can be expanded  by a two  layer perceptron,.. The perceptron Learning rule will  always Be able to find some correct weights
Perceptron Limitations A single layer perceptron can only learn  linearly separable  problems. Boolean AND function is linearly separable, whereas Boolean  X OR function (and the parity problem in general)  is not .
Linear Separability Boolean AND   Boolean  X OR
Perceptron Limitations Linear Decision Boundary Linearly Inseparable Problems
Apple/Banana Example  -  Self Study Training Set Random  Initial Weights First Iteration e t 1 a – 1 0 – 1 = = =
 
 
The Perceptron was a Big Hit Spawned the first wave in  “CONNECTIONISM” Great interest and optimism about the  future of Neural networks First Neural Network hardware was built in The late fifties and early sixties
 
 
 
MULTILAYER PERCEPTRON
XOR problem XOR (exclusive OR) problem 0+0=0 1+1=2=0  mod 2 1+0=1 0+1=1 Perceptron does not work here  Single layer generates a linear  decision boundary
Minsky & Papert (1969) offered solution to XOR problem by  combining perceptron unit responses using a second layer of  units 1 2 +1 3 +1
x n x 1 x 2 Inputs x i Outputs y j Two-layer networks y 1 y m 2nd layer weights w ij  from j to i 1st layer weights v ij  from j to i Outputs of 1st layer z i
Multilayer Perceptron Architecture
Training Multilayer Perceptron Networks The goal of the training process is to find the set of weight values that will cause the output from the neural network to match the actual target values as closely as possible. There are several issues involved in designing and training a multilayer perceptron network:  Selecting how many hidden layers to use in the network.  Deciding how many neurons to use in each hidden layer.  Finding a globally optimal solution that avoids local minima.  Converging to an optimal solution in a reasonable period of time.  Validating the neural network to test for overfitting.
 
HUMAN NEURON COMPARED TO  ANN
 
APPLICATIONS OF PERCEPTRON
Cybernetics and brain simulation Main articles:  Cybernetics  and  Computational neuroscience There is no consensus on how closely the brain should be  simulated . In the 1940s and 1950s, a number of researchers explored the connection between  neurology ,  information theory , and  cybernetics . Some of them built machines that used electronic networks to exhibit rudimentary intelligence, such as  W. Grey Walter 's  turtles  and the  Johns Hopkins Beast . Many of these researchers gathered for meetings of the Teleological Society at  Princeton University  and the  Ratio Club  in England. [24]  By 1960, this approach was largely abandoned, although elements of it would be revived in the 1980s.
 
General intelligence Main articles:  Strong AI  and  AI-complete Most researchers hope that their work will eventually be incorporated into a machine with  general  intelligence (known as  strong AI ), combining all the skills above and exceeding human abilities at most or all of them. [12]  A few believe that  anthropomorphic  features like  artificial consciousness  or an  artificial brain  may be required for such a project. [74] Many of the problems above are considered  AI-complete : to solve one problem, you must solve them all. For example, even a straightforward, specific task like  machine translation  requires that the machine follow the author's argument ( reason ), know what is being talked about ( knowledge ), and faithfully reproduce the author's intention ( social intelligence ).  Machine translation , therefore, is believed to be AI-complete: it may require  strong AI  to be done as well as humans can do it. [75]
 
Some important conclusions from the work were as follows: Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently.  Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful — with lower recognition rates, pilots would not use the system.  More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained. Military High-performance fighter aircraft
 
PERCEPTRON Presented  By SURESH. G SATHEESH. D RAJA LAKSHMI . S

More Related Content

Perceptron

  • 1.  
  • 4. Perceptron A single artificial neuron that computes its weighted input and uses a threshold activation function. It is also called a TLU (Threshold Logic Unit) It effectively separates the input space into two categories by the hyperplane : w T x + b i = 0
  • 5. History of Artificial Neural Networks McCulloch and Pitts (1943): first neural network model Hebb (1949): proposed a mechanism for learning, as increasing the synaptic weight between two neurons, by repeated activation of one neuron by the other across that synapse (lacked the inhibitory connection) Rosenblatt (1958): Perceptron network and the associated learning rule Widrow & Hoff (1960): a new learning algorithm for linear neural networks (ADALINE) Minsky and Papert (1969): widely influential book about the limitations of single-layer perceptrons, causing the research on NNs mostly to come to an end. Some that still went on: Anderson, Kohonen (1972): Use of ANNs as associative memory Grossberg (1980): Adaptive Resonance Theory Hopfield (1982): Hopfield Network Kohonen (1982): Self-organizing maps Rumelhart and McClelland (1982): Backpropagation algorithm for training multilayer feed-forward networks . Started a resurgence on NN research again.
  • 7. Types of Learnin g • Supervised Learning Network is provided with a set of examples of proper network behavior (inputs/targets) • Reinforcement Learning Network is only provided with a grade, or score, which indicates network performance • Unsupervised Learning Only network inputs are available to the learning algorithm. Network learns to categorize (cluster) the inputs.
  • 8. 1. Perceotron 2. Delta Rule 3. Error – Backprobagation Error-correcting Learning.
  • 9. Decision Boundary • All points on the decision boundary have the same inner product (= -b) with the weight vector • Therefore they have the same projection onto the weight vector ; so they must lie on a line orthogonal to the weight vector w T .p = ||w||||p||Cos  proj. of p onto w = ||p||Cos  = w T .p /||w||  p w proj. of p onto w
  • 10. Two layers Binary nodes that takes values 0 or1 Continuous weights , initially Chosen randomly
  • 11.  
  • 12. Input Layer — A vector of predictor variable values ( x1...xp ) is presented to the input layer. The input layer (or processing before the input layer) standardizes these values so that the range of each variable is -1 to 1. The input layer distributes the values to each of the neurons in the hidden layer. In addition to the predictor variables, there is a constant input of 1.0, called the bias that is fed to each of the hidden layers; the bias is multiplied by a weight and added to the sum going into the neuron.
  • 13. Hidden Layer — Arriving at a neuron in the hidden layer, the value from each input neuron is multiplied by a weight ( wji ), and the resulting weighted values are added together producing a combined value uj . The weighted sum ( uj ) is fed into a transfer function, σ, which outputs a value hj . The outputs from the hidden layer are distributed to the output layer.
  • 14. Output Layer Arriving at a neuron in the output layer, the value from each hidden layer neuron is multiplied by a weight ( wkj ), and the resulting weighted values are added together producing a combined value vj . The weighted sum ( vj ) is fed into a transfer function, σ, which outputs a value yk .
  • 15.  
  • 16. Learning Problem To Be Solved How could we adjust the weights, so that this situation is remedied And the spontaneous output matches our target output pattern (0)? We have a net input of –0.1,Which Gives an output pattern of (0) We have a single input pattern(1) Suppose we have an input pattern(0,1)
  • 17.   Answer So we will leave it alone   Observation: Weights from input node with activation 0 does not have any effect on the net input   E.g.,add 0.2 to all weights   Increase the weights,so that the net input exceeds 0.0
  • 18. Perceptron algorithm in words For each node in the output layer: Calculate the error,which can onlytakestheValues1and1 If the error is0,the goal has been achieved.Otherwise,we adjust the weights Do not alter weights from inactivated input Nodes   Decrease the weight if the error was 1,increase It if the error was-1
  • 19. Perceptron algorithm in rules Weight change = some small constant * (target activation-spontaneous output Activation) * input activation If speak of error instead of the “Target activation of minus the spontaneous output activation”,we have Weight change = Some small constant * error * input activation
  • 20.  
  • 21. Perceptro Learning Rule ( Summary ) How do we find the weights using a learning procedure? 1 - Choose initial weights randomly 2 - Present a randomly chosen pattern x 3 - Update weights using Delta rule: w ij (t+1) = w ij (t) + err i * x j where err i = (target i - o utput i ) 4 - Repeat steps 2 and 3 until the stopping criterion (convergence, max number of iterations) is reached
  • 22. Perceptron Convergence theorem If a pattern set can be expanded by a two layer perceptron,.. The perceptron Learning rule will always Be able to find some correct weights
  • 23. Perceptron Limitations A single layer perceptron can only learn linearly separable problems. Boolean AND function is linearly separable, whereas Boolean X OR function (and the parity problem in general) is not .
  • 24. Linear Separability Boolean AND Boolean X OR
  • 25. Perceptron Limitations Linear Decision Boundary Linearly Inseparable Problems
  • 26. Apple/Banana Example - Self Study Training Set Random Initial Weights First Iteration e t 1 a – 1 0 – 1 = = =
  • 27.  
  • 28.  
  • 29. The Perceptron was a Big Hit Spawned the first wave in “CONNECTIONISM” Great interest and optimism about the future of Neural networks First Neural Network hardware was built in The late fifties and early sixties
  • 30.  
  • 31.  
  • 32.  
  • 34. XOR problem XOR (exclusive OR) problem 0+0=0 1+1=2=0 mod 2 1+0=1 0+1=1 Perceptron does not work here Single layer generates a linear decision boundary
  • 35. Minsky & Papert (1969) offered solution to XOR problem by combining perceptron unit responses using a second layer of units 1 2 +1 3 +1
  • 36. x n x 1 x 2 Inputs x i Outputs y j Two-layer networks y 1 y m 2nd layer weights w ij from j to i 1st layer weights v ij from j to i Outputs of 1st layer z i
  • 38. Training Multilayer Perceptron Networks The goal of the training process is to find the set of weight values that will cause the output from the neural network to match the actual target values as closely as possible. There are several issues involved in designing and training a multilayer perceptron network: Selecting how many hidden layers to use in the network. Deciding how many neurons to use in each hidden layer. Finding a globally optimal solution that avoids local minima. Converging to an optimal solution in a reasonable period of time. Validating the neural network to test for overfitting.
  • 39.  
  • 41.  
  • 43. Cybernetics and brain simulation Main articles: Cybernetics and Computational neuroscience There is no consensus on how closely the brain should be simulated . In the 1940s and 1950s, a number of researchers explored the connection between neurology , information theory , and cybernetics . Some of them built machines that used electronic networks to exhibit rudimentary intelligence, such as W. Grey Walter 's turtles and the Johns Hopkins Beast . Many of these researchers gathered for meetings of the Teleological Society at Princeton University and the Ratio Club in England. [24] By 1960, this approach was largely abandoned, although elements of it would be revived in the 1980s.
  • 44.  
  • 45. General intelligence Main articles: Strong AI and AI-complete Most researchers hope that their work will eventually be incorporated into a machine with general intelligence (known as strong AI ), combining all the skills above and exceeding human abilities at most or all of them. [12] A few believe that anthropomorphic features like artificial consciousness or an artificial brain may be required for such a project. [74] Many of the problems above are considered AI-complete : to solve one problem, you must solve them all. For example, even a straightforward, specific task like machine translation requires that the machine follow the author's argument ( reason ), know what is being talked about ( knowledge ), and faithfully reproduce the author's intention ( social intelligence ). Machine translation , therefore, is believed to be AI-complete: it may require strong AI to be done as well as humans can do it. [75]
  • 46.  
  • 47. Some important conclusions from the work were as follows: Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently. Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful — with lower recognition rates, pilots would not use the system. More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained. Military High-performance fighter aircraft
  • 48.  
  • 49. PERCEPTRON Presented By SURESH. G SATHEESH. D RAJA LAKSHMI . S