Perceptron
- 4. Perceptron A single artificial neuron that computes its weighted input and uses a threshold activation function. It is also called a TLU (Threshold Logic Unit) It effectively separates the input space into two categories by the hyperplane : w T x + b i = 0
- 5. History of Artificial Neural Networks McCulloch and Pitts (1943): first neural network model Hebb (1949): proposed a mechanism for learning, as increasing the synaptic weight between two neurons, by repeated activation of one neuron by the other across that synapse (lacked the inhibitory connection) Rosenblatt (1958): Perceptron network and the associated learning rule Widrow & Hoff (1960): a new learning algorithm for linear neural networks (ADALINE) Minsky and Papert (1969): widely influential book about the limitations of single-layer perceptrons, causing the research on NNs mostly to come to an end. Some that still went on: Anderson, Kohonen (1972): Use of ANNs as associative memory Grossberg (1980): Adaptive Resonance Theory Hopfield (1982): Hopfield Network Kohonen (1982): Self-organizing maps Rumelhart and McClelland (1982): Backpropagation algorithm for training multilayer feed-forward networks . Started a resurgence on NN research again.
- 7. Types of Learnin g • Supervised Learning Network is provided with a set of examples of proper network behavior (inputs/targets) • Reinforcement Learning Network is only provided with a grade, or score, which indicates network performance • Unsupervised Learning Only network inputs are available to the learning algorithm. Network learns to categorize (cluster) the inputs.
- 8. 1. Perceotron 2. Delta Rule 3. Error – Backprobagation Error-correcting Learning.
- 9. Decision Boundary • All points on the decision boundary have the same inner product (= -b) with the weight vector • Therefore they have the same projection onto the weight vector ; so they must lie on a line orthogonal to the weight vector w T .p = ||w||||p||Cos proj. of p onto w = ||p||Cos = w T .p /||w|| p w proj. of p onto w
- 10. Two layers Binary nodes that takes values 0 or1 Continuous weights , initially Chosen randomly
- 12. Input Layer — A vector of predictor variable values ( x1...xp ) is presented to the input layer. The input layer (or processing before the input layer) standardizes these values so that the range of each variable is -1 to 1. The input layer distributes the values to each of the neurons in the hidden layer. In addition to the predictor variables, there is a constant input of 1.0, called the bias that is fed to each of the hidden layers; the bias is multiplied by a weight and added to the sum going into the neuron.
- 13. Hidden Layer — Arriving at a neuron in the hidden layer, the value from each input neuron is multiplied by a weight ( wji ), and the resulting weighted values are added together producing a combined value uj . The weighted sum ( uj ) is fed into a transfer function, σ, which outputs a value hj . The outputs from the hidden layer are distributed to the output layer.
- 14. Output Layer Arriving at a neuron in the output layer, the value from each hidden layer neuron is multiplied by a weight ( wkj ), and the resulting weighted values are added together producing a combined value vj . The weighted sum ( vj ) is fed into a transfer function, σ, which outputs a value yk .
- 16. Learning Problem To Be Solved How could we adjust the weights, so that this situation is remedied And the spontaneous output matches our target output pattern (0)? We have a net input of –0.1,Which Gives an output pattern of (0) We have a single input pattern(1) Suppose we have an input pattern(0,1)
- 17. Answer So we will leave it alone Observation: Weights from input node with activation 0 does not have any effect on the net input E.g.,add 0.2 to all weights Increase the weights,so that the net input exceeds 0.0
- 18. Perceptron algorithm in words For each node in the output layer: Calculate the error,which can onlytakestheValues1and1 If the error is0,the goal has been achieved.Otherwise,we adjust the weights Do not alter weights from inactivated input Nodes Decrease the weight if the error was 1,increase It if the error was-1
- 19. Perceptron algorithm in rules Weight change = some small constant * (target activation-spontaneous output Activation) * input activation If speak of error instead of the “Target activation of minus the spontaneous output activation”,we have Weight change = Some small constant * error * input activation
- 21. Perceptro Learning Rule ( Summary ) How do we find the weights using a learning procedure? 1 - Choose initial weights randomly 2 - Present a randomly chosen pattern x 3 - Update weights using Delta rule: w ij (t+1) = w ij (t) + err i * x j where err i = (target i - o utput i ) 4 - Repeat steps 2 and 3 until the stopping criterion (convergence, max number of iterations) is reached
- 22. Perceptron Convergence theorem If a pattern set can be expanded by a two layer perceptron,.. The perceptron Learning rule will always Be able to find some correct weights
- 23. Perceptron Limitations A single layer perceptron can only learn linearly separable problems. Boolean AND function is linearly separable, whereas Boolean X OR function (and the parity problem in general) is not .
- 26. Apple/Banana Example - Self Study Training Set Random Initial Weights First Iteration e t 1 a – 1 0 – 1 = = =
- 29. The Perceptron was a Big Hit Spawned the first wave in “CONNECTIONISM” Great interest and optimism about the future of Neural networks First Neural Network hardware was built in The late fifties and early sixties
- 34. XOR problem XOR (exclusive OR) problem 0+0=0 1+1=2=0 mod 2 1+0=1 0+1=1 Perceptron does not work here Single layer generates a linear decision boundary
- 35. Minsky & Papert (1969) offered solution to XOR problem by combining perceptron unit responses using a second layer of units 1 2 +1 3 +1
- 36. x n x 1 x 2 Inputs x i Outputs y j Two-layer networks y 1 y m 2nd layer weights w ij from j to i 1st layer weights v ij from j to i Outputs of 1st layer z i
- 38. Training Multilayer Perceptron Networks The goal of the training process is to find the set of weight values that will cause the output from the neural network to match the actual target values as closely as possible. There are several issues involved in designing and training a multilayer perceptron network: Selecting how many hidden layers to use in the network. Deciding how many neurons to use in each hidden layer. Finding a globally optimal solution that avoids local minima. Converging to an optimal solution in a reasonable period of time. Validating the neural network to test for overfitting.
- 43. Cybernetics and brain simulation Main articles: Cybernetics and Computational neuroscience There is no consensus on how closely the brain should be simulated . In the 1940s and 1950s, a number of researchers explored the connection between neurology , information theory , and cybernetics . Some of them built machines that used electronic networks to exhibit rudimentary intelligence, such as W. Grey Walter 's turtles and the Johns Hopkins Beast . Many of these researchers gathered for meetings of the Teleological Society at Princeton University and the Ratio Club in England. [24] By 1960, this approach was largely abandoned, although elements of it would be revived in the 1980s.
- 45. General intelligence Main articles: Strong AI and AI-complete Most researchers hope that their work will eventually be incorporated into a machine with general intelligence (known as strong AI ), combining all the skills above and exceeding human abilities at most or all of them. [12] A few believe that anthropomorphic features like artificial consciousness or an artificial brain may be required for such a project. [74] Many of the problems above are considered AI-complete : to solve one problem, you must solve them all. For example, even a straightforward, specific task like machine translation requires that the machine follow the author's argument ( reason ), know what is being talked about ( knowledge ), and faithfully reproduce the author's intention ( social intelligence ). Machine translation , therefore, is believed to be AI-complete: it may require strong AI to be done as well as humans can do it. [75]
- 47. Some important conclusions from the work were as follows: Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently. Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful — with lower recognition rates, pilots would not use the system. More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained. Military High-performance fighter aircraft