This document discusses recurrent neural networks (RNNs) and their applications. It begins by explaining that RNNs can process input sequences of arbitrary lengths, unlike other neural networks. It then provides examples of RNN applications, such as predicting time series data, autonomous driving, natural language processing, and music generation. The document goes on to describe the fundamental concepts of RNNs, including recurrent neurons, memory cells, and different types of RNN architectures for processing input/output sequences. It concludes by demonstrating how to implement basic RNNs using TensorFlow's static_rnn function.
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
The document discusses Long Short Term Memory (LSTM) networks, which are a type of recurrent neural network capable of learning long-term dependencies. It explains that unlike standard RNNs, LSTMs use forget, input, and output gates to control the flow of information into and out of the cell state, allowing them to better capture long-range temporal dependencies in sequential data like text, audio, and time-series data. The document provides details on how LSTM gates work and how LSTMs can be used for applications involving sequential data like machine translation and question answering.
The document provides an overview of LSTM (Long Short-Term Memory) networks. It first reviews RNNs (Recurrent Neural Networks) and their limitations in capturing long-term dependencies. It then introduces LSTM networks, which address this issue using forget, input, and output gates that allow the network to retain information for longer. Code examples are provided to demonstrate how LSTM remembers information over many time steps. Resources for further reading on LSTMs and RNNs are listed at the end.
An overview of gradient descent optimization algorithms Hakky St
This document provides an overview of various gradient descent optimization algorithms that are commonly used for training deep learning models. It begins with an introduction to gradient descent and its variants, including batch gradient descent, stochastic gradient descent (SGD), and mini-batch gradient descent. It then discusses challenges with these algorithms, such as choosing the learning rate. The document proceeds to explain popular optimization algorithms used to address these challenges, including momentum, Nesterov accelerated gradient, Adagrad, Adadelta, RMSprop, and Adam. It provides visualizations and intuitive explanations of how these algorithms work. Finally, it discusses strategies for parallelizing and optimizing SGD and concludes with a comparison of optimization algorithms.
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
This Material is an in_depth study report of Recurrent Neural Network (RNN)
Material mainly from Deep Learning Book Bible, http://www.deeplearningbook.org/
Topics: Briefing, Theory Proof, Variation, Gated RNNN Intuition. Real World Application
Application (CNN+RNN on SVHN)
Also a video (In Chinese)
https://www.youtube.com/watch?v=p6xzPqRd46w
1) Transformers use self-attention to solve problems with RNNs like vanishing gradients and parallelization. They combine CNNs and attention.
2) Transformers have encoder and decoder blocks. The encoder models input and decoder models output. Variations remove encoder (GPT) or decoder (BERT) for language modeling.
3) GPT-3 is a large Transformer with 175B parameters that can perform many NLP tasks but still has safety and bias issues.
The document provides an introduction to neural networks, including:
- Biological neural networks transmit signals via neurons connected by synapses and axons.
- Artificial neural networks are composed of simple processing elements (neurons) that operate in parallel and are determined by network structure and connection strengths (weights).
- Multilayer neural networks consist of an input layer, hidden layers, and output layer connected by weights to solve complex problems. Learning involves updating weights so the network can efficiently perform tasks.
This presentation on Recurrent Neural Network will help you understand what is a neural network, what are the popular neural networks, why we need recurrent neural network, what is a recurrent neural network, how does a RNN work, what is vanishing and exploding gradient problem, what is LSTM and you will also see a use case implementation of LSTM (Long short term memory). Neural networks used in Deep Learning consists of different layers connected to each other and work on the structure and functions of the human brain. It learns from huge volumes of data and used complex algorithms to train a neural net. The recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer. Now lets deep dive into this presentation and understand what is RNN and how does it actually work.
Below topics are explained in this recurrent neural networks tutorial:
1. What is a neural network?
2. Popular neural networks?
3. Why recurrent neural network?
4. What is a recurrent neural network?
5. How does an RNN work?
6. Vanishing and exploding gradient problem
7. Long short term memory (LSTM)
8. Use case implementation of LSTM
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
This document provides an overview and introduction to deep learning. It discusses motivations for deep learning such as its powerful learning capabilities. It then covers deep learning basics like neural networks, neurons, training processes, and gradient descent. It also discusses different network architectures like convolutional neural networks and recurrent neural networks. Finally, it describes various deep learning applications, tools, and key researchers and companies in the field.
Recurrent Neural Networks are popular Deep Learning models that have shown great promise to achieve state-of-the-art results in many tasks like Computer Vision, NLP, Finance and much more. Although being models proposed several years ago, RNN have gained popularity recently. In this talk, we will review how these models evolved over the years, dissection of RNN, current applications and its future.
[기초개념] Recurrent Neural Network (RNN) 소개Donghyeon Kim
* 시계열 데이터의 시간적 속성을 이용하는 RNN과 그 한계점을 극복하기 위한 LSTM, GRU 기법에 대해 기본적인 개념을 소개합니다.
* 광주과학기술원 인공지능 스터디 A-GIST 모임에서 발표했습니다.
* 발표 영상 (유튜브, 한국어): https://youtu.be/Dt2SCbKbKvs
Time Series Forecasting Using Recurrent Neural Network and Vector Autoregress...Databricks
Given the resurgence of neural network-based techniques in recent years, it is important for data science practitioner to understand how to apply these techniques and the tradeoffs between neural network-based and traditional statistical methods.
This lecture discusses two specific techniques: Vector Autoregressive (VAR) Models and Recurrent Neural Network (RNN). The former is one of the most important class of multivariate time series statistical models applied in finance while the latter is a neural network architecture that is suitable for time series forecasting. I’ll demonstrate how they are implemented in practice and compares their advantages and disadvantages. Real-world applications, demonstrated using python and Spark, are used to illustrate these techniques. While not the focus in this lecture, exploratory time series data analysis using time-series plot, plots of autocorrelation (i.e. correlogram), plots of partial autocorrelation, plots of cross-correlations, histogram, and kernel density plot, will also be included in the demo.
The attendees will learn – the formulation of a time series forecasting problem statement in context of VAR and RNN – the application of Recurrent Neural Network-based techniques in time series forecasting – the application of Vector Autoregressive Models in multivariate time series forecasting – the pros and cons of using VAR and RNN-based techniques in the context of financial time series forecasting – When to use VAR and when to use RNN-based techniques
This document discusses quantum neural networks. It begins by defining artificial neural networks as interconnected processing elements that process information through dynamic responses to external inputs. The document then provides more details on the basics of neural networks, including their typical layered organization and use of weighted connections and activation functions. It also discusses how neural networks differ from conventional computing by operating in parallel rather than sequentially, and provides some examples of neural network applications and limitations.
This is a presentation I gave as a short overview of LSTMs. The slides are accompanied by two examples which apply LSTMs to Time Series data. Examples were implemented using Keras. See links in slide pack.
Spiking neural network: an introduction IDalin Zhang
1) The document discusses Spiking Neural Networks (SNNs), which are a type of neural network that more closely mimic biological neural behavior.
2) It describes the Leaky Integrate-and-Fire (LIF) neuron model, which is commonly used in SNNs. The LIF model integrates inputs over time and generates spikes when the voltage exceeds a threshold.
3) Different encoding approaches are discussed for representing input data as spike trains, including rate coding, temporal coding, population coding, and the Hough Spiker Algorithm. These approaches transform real-valued inputs into spike timings.
This document discusses attention mechanisms in deep learning models. It covers attention in sequence models like recurrent neural networks (RNNs) and neural machine translation. It also discusses attention in convolutional neural network (CNN) based models, including spatial transformer networks which allow spatial transformations of feature maps. The document notes that spatial transformer networks have achieved state-of-the-art results on image classification tasks and fine-grained visual recognition. It provides an overview of the localisation network, parameterised sampling grid, and differentiable image sampling components of spatial transformer networks.
Artificial Neural Network seminar presentation using ppt.Mohd Faiz
- Artificial neural networks are inspired by biological neural networks and learning processes. They attempt to mimic the workings of the brain using simple units called artificial neurons that are connected in networks.
- Learning in neural networks involves modifying the synaptic strengths between neurons through mathematical optimization techniques. The goal is to minimize an error function that measures how well the network can approximate or complete a task.
- Neural networks can learn complex nonlinear functions through training algorithms like backpropagation that determine how to adjust the synaptic weights to improve performance on the learning task.
This document provides an introduction to soft computing techniques including fuzzy logic, neural networks, and genetic algorithms. It discusses how these techniques are inspired by human intelligence and can handle imprecise or uncertain data. Examples of applications are given such as fuzzy logic in washing machines to optimize the washing process based on sensor readings, and using genetic algorithms to design optimal robotics.
Overview on Optimization algorithms in Deep LearningKhang Pham
Overview on function optimization in general and in deep learning. The slides cover from basic algorithms like batch gradient descent, stochastic gradient descent to the state of art algorithm like Momentum, Adagrad, RMSprop, Adam.
This document discusses recurrent neural networks (RNNs) and some of their applications and design patterns. RNNs are able to process sequential data like text or time series due to their ability to maintain an internal state that captures information about what has been observed in the past. The key challenges with training RNNs are vanishing and exploding gradients, which various techniques like LSTMs and GRUs aim to address. RNNs have been successfully applied to tasks involving sequential input and/or output like machine translation, image captioning, and language modeling. Memory networks extend RNNs with an external memory component that can be explicitly written to and retrieved from.
Deep learning (also known as deep structured learning or hierarchical learning) is the application of artificial neural networks (ANNs) to learning tasks that contain more than one hidden layer. Deep learning is part of a broader family of machine learning methods based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, partially supervised or unsupervised.
Artificial Neural Networks - An Introduction.pptxTharaka Devinda
An introduction to artificial neural networks. Discusses how the biological neuron and the artificial neural networks are related and how backpropogation is used to train a network.
RNN and LSTM model description and working advantages and disadvantagesAbhijitVenkatesh1
Recurrent neural networks (RNNs) are designed to process sequential data like time series and text. RNNs include loops that allow information to persist from one step to the next to model temporal dependencies in data. However, vanilla RNNs suffer from exploding and vanishing gradients when training on long sequences. Long short-term memory (LSTM) and gated recurrent unit (GRU) networks address this issue using gating mechanisms that control the flow of information, allowing them to learn from experiences far back in time. RNNs have been successfully applied to many real-world problems involving sequential data, such as speech recognition, machine translation, sentiment analysis and image captioning.
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
This document summarizes key concepts from a lecture on neural networks and neuroscience:
- Single-layer neural networks like perceptrons can only learn linearly separable patterns, while multi-layer networks can approximate any function. Backpropagation enables training multi-layer networks.
- Recurrent neural networks incorporate memory through recurrent connections between units. Backpropagation through time extends backpropagation to train recurrent networks.
- The cerebellum functions similarly to a perceptron for motor learning and control. Its feedforward circuitry from mossy fibers to Purkinje cells maps to the layers of a perceptron.
Machine Learning - Introduction to Recurrent Neural NetworksAndrew Ferlitsch
Abstract: This PDSG workshop introduces basic concepts of recurrent neural networks. Concepts covered are feed forward vs. recurrent, time progression, memory cells, short term memory predictions and long term memory predictions.
Level: Fundamental
Requirements: No prior programming or statistics knowledge required.
This document presents an overview of recurrent neural networks (RNNs) by Aamir Maqsood at Central University of Kashmir. It begins with an introduction and outline, then discusses feedforward neural networks and their limitations. Convolutional neural networks are also covered. RNNs are introduced as a way to handle sequence learning problems by accounting for dependence between inputs and variable input sizes. The document describes the basic RNN structure and cell, and covers different types of RNNs like one-to-one, one-to-many, many-to-one, and many-to-many. It concludes with references for further reading on RNNs.
Recurrent neural networks (RNNs) and convolutional neural networks (CNNs) are two common types of deep neural networks. RNNs include feedback connections so they can learn from sequence data like text, while CNNs are useful for visual data due to their translation invariance from pooling and convolutional layers. The document provides examples of applying RNNs and CNNs to tasks like sentiment analysis, image classification, and machine translation. It also discusses common CNN architecture components like convolutional layers, activation functions like ReLU, pooling layers, and fully connected layers.
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
This talk describes an experimental approach to time series modeling using 1D convolution filter layers in a neural network architecture. This approach was developed at System1 for forecasting marketplace value of online advertising categories.
Aaa ped-22-Artificial Neural Network: Introduction to ANNAminaRepo
Finally we will talk about Artificial Neural Networks (ANN) . In this part we will focus on the building blocks of ANN's. We will describe the perceptron and its learning rules. We will talk in more details about the gradient descent algorithm.
We will also define an important concept in ANN which is : the back-propagation algorithm. For our examples we will use two libraries: neurolab and scikit-learn libraries.
[Notebook](https://colab.research.google.com/drive/1CqYwu9NzeXuUNeR8RmkDplUd71DHWNod)
The document discusses various deep learning techniques for recommendation systems, including representation learning and neural networks. It describes using embeddings to represent users, items, reviews and other data, as well as neural networks like multilayer perceptrons, convolutional neural networks and recurrent neural networks to model sequential data and generate recommendations. Architectures like joint models that combine user and item representations are also summarized.
Recurrent and Recursive Networks (Part 1)sohaib_alam
1. Recurrent neural networks (RNNs) can be represented as computational graphs that are unfolded through time. This unfolding allows the input size to remain fixed and for shared parameters to be used at each time step.
2. Common RNN architectures include those with recurrence between hidden units and those with recurrence from output to hidden units. Training RNNs involves backpropagation through time, which has linear time and memory costs with respect to the sequence length.
3. RNNs can be viewed as directed graphical models that represent the joint probability distribution over an output sequence conditioned on inputs. Introducing hidden units allows parameter sharing across time steps.
Digital Signal Processing (DSP) from basics introduction to medium level book based on Anna University Syllabus! This is just a share of worthfull book!
-Prabhaharan Ellaiyan
-prabhaharan429@gmail.com
-www.insmartworld.blogspot.in
Introduction To Using TensorFlow & Deep Learningali alemi
This document provides an introduction to using TensorFlow. It begins with an overview of TensorFlow and what it is. It then discusses TensorFlow code basics, including building computational graphs and running sessions. It provides examples of using placeholders, constants, and variables. It also gives an example of linear regression using TensorFlow. Finally, it discusses deep learning techniques like convolutional neural networks (CNNs) and recurrent neural networks (RNNs), providing examples of CNNs for image classification. It concludes with an example of using a multi-layer perceptron for MNIST digit classification in TensorFlow.
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...Simplilearn
The document provides an overview of using TensorFlow to build deep learning models. It discusses how TensorFlow uses computational graphs to process data and perform computations. Tensors represent multi-dimensional data and are core to TensorFlow's operations. The document also demonstrates how to build simple models like linear regression and recurrent neural networks (RNNs) using TensorFlow. An example RNN model predicts monthly milk production using time series data.
This document provides an overview of artificial neural networks and the backpropagation algorithm. Some key points:
- Artificial neural networks (ANNs) are composed of densely interconnected simple units that can learn real-valued functions from examples using algorithms like backpropagation.
- Backpropagation uses gradient descent to minimize error between network outputs and targets by adjusting network parameters (weights and biases).
- Multilayer networks with sigmoid units in hidden layers can represent nonlinear functions, unlike single perceptrons which are limited to linear separability.
- The backpropagation algorithm employs gradient descent over the entire network, computing error derivatives layer-by-layer to update weights to minimize overall error.
Artificial Neural Network by Dr.C.R.Dhivyaa Kongu Engineering CollegeDhivyaa C.R
This document provides an overview of artificial neural networks and the backpropagation algorithm. Some key points:
- Artificial neural networks (ANNs) are composed of densely interconnected simple units that can learn real-valued functions from examples using algorithms like backpropagation.
- Backpropagation uses gradient descent to minimize error between network outputs and targets by adjusting network parameters (weights and biases).
- Multilayer networks with sigmoid units in hidden layers can represent nonlinear functions, unlike single-layer perceptrons which are limited to linear separability.
- The backpropagation algorithm employs gradient descent over the entire network, computing error derivatives layer-by-layer to update weights to minimize overall error.
https://telecombcn-dl.github.io/2017-dlcv/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.
Understanding computer vision with Deep LearningCloudxLab
Computer vision is a branch of computer science which deals with recognising objects, people and identifying patterns in visuals. It is basically analogous to the vision of an animal.
Topics covered:
1. Overview of Machine Learning
2. Basics of Deep Learning
3. What is computer vision and its use-cases?
4. Various algorithms used in Computer Vision (mostly CNN)
5. Live hands-on demo of either Auto Cameraman or Face recognition system
6. What next?
This document provides an agenda for an introduction to deep learning presentation. It begins with an introduction to basic AI, machine learning, and deep learning terms. It then briefly discusses use cases of deep learning. The document outlines how to approach a deep learning problem, including which tools and algorithms to use. It concludes with a question and answer section.
Natural Language Processing (NLP) is a field of artificial intelligence that deals with interactions between computers and human languages. NLP aims to program computers to process and analyze large amounts of natural language data. Some common NLP tasks include speech recognition, text classification, machine translation, question answering, and more. Popular NLP tools include Stanford CoreNLP, NLTK, OpenNLP, and TextBlob. Vectorization is commonly used to represent text in a way that can be used for machine learning algorithms like calculating text similarity. Tf-idf is a common technique used to weigh words based on their frequency and importance.
- Naive Bayes is a classification technique based on Bayes' theorem that uses "naive" independence assumptions. It is easy to build and can perform well even with large datasets.
- It works by calculating the posterior probability for each class given predictor values using the Bayes theorem and independence assumptions between predictors. The class with the highest posterior probability is predicted.
- It is commonly used for text classification, spam filtering, and sentiment analysis due to its fast performance and high success rates compared to other algorithms.
An autoencoder is an artificial neural network that is trained to copy its input to its output. It consists of an encoder that compresses the input into a lower-dimensional latent-space encoding, and a decoder that reconstructs the output from this encoding. Autoencoders are useful for dimensionality reduction, feature learning, and generative modeling. When constrained by limiting the latent space or adding noise, autoencoders are forced to learn efficient representations of the input data. For example, a linear autoencoder trained with mean squared error performs principal component analysis.
The document discusses challenges in training deep neural networks and solutions to those challenges. Training deep neural networks with many layers and parameters can be slow and prone to overfitting. A key challenge is the vanishing gradient problem, where the gradients shrink exponentially small as they propagate through many layers, making earlier layers very slow to train. Solutions include using initialization techniques like He initialization and activation functions like ReLU and leaky ReLU that do not saturate, preventing gradients from vanishing. Later improvements include the ELU activation function.
( Machine Learning & Deep Learning Specialization Training: https://goo.gl/5u2RiS )
This CloudxLab Reinforcement Learning tutorial helps you to understand Reinforcement Learning in detail. Below are the topics covered in this tutorial:
1) What is Reinforcement?
2) Reinforcement Learning an Introduction
3) Reinforcement Learning Example
4) Learning to Optimize Rewards
5) Policy Search - Brute Force Approach, Genetic Algorithms and Optimization Techniques
6) OpenAI Gym
7) The Credit Assignment Problem
8) Inverse Reinforcement Learning
9) Playing Atari with Deep Reinforcement Learning
10) Policy Gradients
11) Markov Decision Processes
Apache Spark - Key Value RDD - Transformations | Big Data Hadoop Spark Tutori...CloudxLab
The document provides information about key-value RDD transformations and actions in Spark. It defines transformations like keys(), values(), groupByKey(), combineByKey(), sortByKey(), subtractByKey(), join(), leftOuterJoin(), rightOuterJoin(), and cogroup(). It also defines actions like countByKey() and lookup() that can be performed on pair RDDs. Examples are given showing how to use these transformations and actions to manipulate key-value RDDs.
Advanced Spark Programming - Part 2 | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2kyRTuW
This CloudxLab Advanced Spark Programming tutorial helps you to understand Advanced Spark Programming in detail. Below are the topics covered in this slide:
1) Shared Variables - Accumulators & Broadcast Variables
2) Accumulators and Fault Tolerance
3) Custom Accumulators - Version 1.x & Version 2.x
4) Examples of Broadcast Variables
5) Key Performance Considerations - Level of Parallelism
6) Serialization Format - Kryo
7) Memory Management
8) Hardware Provisioning
Apache Spark - Dataframes & Spark SQL - Part 2 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sm9c61
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Loading XML
2) What is RPC - Remote Process Call
3) Loading AVRO
4) Data Sources - Parquet
5) Creating DataFrames From Hive Table
6) Setting up Distributed SQL Engine
Apache Spark - Dataframes & Spark SQL - Part 1 | Big Data Hadoop Spark Tutori...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sf2z6i
This CloudxLab Introduction to Spark SQL & DataFrames tutorial helps you to understand Spark SQL & DataFrames in detail. Below are the topics covered in this slide:
1) Introduction to DataFrames
2) Creating DataFrames from JSON
3) DataFrame Operations
4) Running SQL Queries Programmatically
5) Datasets
6) Inferring the Schema Using Reflection
7) Programmatically Specifying the Schema
Apache Spark - Running on a Cluster | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
(Big Data with Hadoop & Spark Training: http://bit.ly/2IUsWca
This CloudxLab Running in a Cluster tutorial helps you to understand running Spark in the cluster in detail. Below are the topics covered in this tutorial:
1) Spark Runtime Architecture
2) Driver Node
3) Scheduling Tasks on Executors
4) Understanding the Architecture
5) Cluster Managers
6) Executors
7) Launching a Program using spark-submit
8) Local Mode & Cluster-Mode
9) Installing Standalone Cluster
10) Cluster Mode - YARN
11) Launching a Program on YARN
12) Cluster Mode - Mesos and AWS EC2
13) Deployment Modes - Client and Cluster
14) Which Cluster Manager to Use?
15) Common flags for spark-submit
Introduction to SparkR | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2LCTufA
This CloudxLab Introduction to SparkR tutorial helps you to understand SparkR in detail. Below are the topics covered in this tutorial:
1) SparkR (R on Spark)
2) SparkR DataFrames
3) Launch SparkR
4) Creating DataFrames from Local DataFrames
5) DataFrame Operation
6) Creating DataFrames - From JSON
7) Running SQL Queries from SparkR
Introduction to NoSQL | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
1) NoSQL databases are non-relational and schema-free, providing alternatives to SQL databases for big data and high availability applications.
2) Common NoSQL database models include key-value stores, column-oriented databases, document databases, and graph databases.
3) The CAP theorem states that a distributed data store can only provide two out of three guarantees around consistency, availability, and partition tolerance.
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...CloudxLab
Big Data with Hadoop & Spark Training: http://bit.ly/2sh5b3E
This CloudxLab Hadoop Streaming tutorial helps you to understand Hadoop Streaming in detail. Below are the topics covered in this tutorial:
1) Hadoop Streaming and Why Do We Need it?
2) Writing Streaming Jobs
3) Testing Streaming jobs and Hands-on on CloudxLab
Introduction To TensorFlow | Deep Learning Using TensorFlow | CloudxLabCloudxLab
This document provides instructions for getting started with TensorFlow using a free CloudxLab. It outlines the following steps:
1. Open CloudxLab and enroll if not already enrolled. Otherwise go to "My Lab".
2. In "My Lab", open Jupyter and run commands to clone an ML repository containing TensorFlow examples.
3. Go to the deep learning folder in Jupyter and open the TensorFlow notebook to get started with examples.
Introduction to Deep Learning | CloudxLabCloudxLab
( Machine Learning & Deep Learning Specialization Training: https://goo.gl/goQxnL )
This CloudxLab Deep Learning tutorial helps you to understand Deep Learning in detail. Below are the topics covered in this tutorial:
1) What is Deep Learning
2) Deep Learning Applications
3) Artificial Neural Network
4) Deep Learning Neural Networks
5) Deep Learning Frameworks
6) AI vs Machine Learning
In this tutorial, we will learn the the following topics -
+ The Curse of Dimensionality
+ Main Approaches for Dimensionality Reduction
+ PCA - Principal Component Analysis
+ Kernel PCA
+ LLE
+ Other Dimensionality Reduction Techniques
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
In this tutorial, we will learn the the following topics -
+ Training and Visualizing a Decision Tree
+ Making Predictions
+ Estimating Class Probabilities
+ The CART Training Algorithm
+ Computational Complexity
+ Gini Impurity or Entropy?
+ Regularization Hyperparameters
+ Regression
+ Instability
Transcript: Details of description part II: Describing images in practice - T...BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and slides: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfNeo4j
Presented at Gartner Data & Analytics, London Maty 2024. BT Group has used the Neo4j Graph Database to enable impressive digital transformation programs over the last 6 years. By re-imagining their operational support systems to adopt self-serve and data lead principles they have substantially reduced the number of applications and complexity of their operations. The result has been a substantial reduction in risk and costs while improving time to value, innovation, and process automation. Join this session to hear their story, the lessons they learned along the way and how their future innovation plans include the exploration of uses of EKG + Generative AI.
How Social Media Hackers Help You to See Your Wife's Message.pdfHackersList
In the modern digital era, social media platforms have become integral to our daily lives. These platforms, including Facebook, Instagram, WhatsApp, and Snapchat, offer countless ways to connect, share, and communicate.
論文紹介:A Systematic Survey of Prompt Engineering on Vision-Language Foundation ...Toru Tamaki
Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr "A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models" arXiv2023
https://arxiv.org/abs/2307.12980
TrustArc Webinar - 2024 Data Privacy Trends: A Mid-Year Check-InTrustArc
Six months into 2024, and it is clear the privacy ecosystem takes no days off!! Regulators continue to implement and enforce new regulations, businesses strive to meet requirements, and technology advances like AI have privacy professionals scratching their heads about managing risk.
What can we learn about the first six months of data privacy trends and events in 2024? How should this inform your privacy program management for the rest of the year?
Join TrustArc, Goodwin, and Snyk privacy experts as they discuss the changes we’ve seen in the first half of 2024 and gain insight into the concrete, actionable steps you can take to up-level your privacy program in the second half of the year.
This webinar will review:
- Key changes to privacy regulations in 2024
- Key themes in privacy and data governance in 2024
- How to maximize your privacy program in the second half of 2024
Quantum Communications Q&A with Gemini LLM. These are based on Shannon's Noisy channel Theorem and offers how the classical theory applies to the quantum world.
YOUR RELIABLE WEB DESIGN & DEVELOPMENT TEAM — FOR LASTING SUCCESS
WPRiders is a web development company specialized in WordPress and WooCommerce websites and plugins for customers around the world. The company is headquartered in Bucharest, Romania, but our team members are located all over the world. Our customers are primarily from the US and Western Europe, but we have clients from Australia, Canada and other areas as well.
Some facts about WPRiders and why we are one of the best firms around:
More than 700 five-star reviews! You can check them here.
1500 WordPress projects delivered.
We respond 80% faster than other firms! Data provided by Freshdesk.
We’ve been in business since 2015.
We are located in 7 countries and have 22 team members.
With so many projects delivered, our team knows what works and what doesn’t when it comes to WordPress and WooCommerce.
Our team members are:
- highly experienced developers (employees & contractors with 5 -10+ years of experience),
- great designers with an eye for UX/UI with 10+ years of experience
- project managers with development background who speak both tech and non-tech
- QA specialists
- Conversion Rate Optimisation - CRO experts
They are all working together to provide you with the best possible service. We are passionate about WordPress, and we love creating custom solutions that help our clients achieve their goals.
At WPRiders, we are committed to building long-term relationships with our clients. We believe in accountability, in doing the right thing, as well as in transparency and open communication. You can read more about WPRiders on the About us page.
Coordinate Systems in FME 101 - Webinar SlidesSafe Software
If you’ve ever had to analyze a map or GPS data, chances are you’ve encountered and even worked with coordinate systems. As historical data continually updates through GPS, understanding coordinate systems is increasingly crucial. However, not everyone knows why they exist or how to effectively use them for data-driven insights.
During this webinar, you’ll learn exactly what coordinate systems are and how you can use FME to maintain and transform your data’s coordinate systems in an easy-to-digest way, accurately representing the geographical space that it exists within. During this webinar, you will have the chance to:
- Enhance Your Understanding: Gain a clear overview of what coordinate systems are and their value
- Learn Practical Applications: Why we need datams and projections, plus units between coordinate systems
- Maximize with FME: Understand how FME handles coordinate systems, including a brief summary of the 3 main reprojectors
- Custom Coordinate Systems: Learn how to work with FME and coordinate systems beyond what is natively supported
- Look Ahead: Gain insights into where FME is headed with coordinate systems in the future
Don’t miss the opportunity to improve the value you receive from your coordinate system data, ultimately allowing you to streamline your data analysis and maximize your time. See you there!
INDIAN AIR FORCE FIGHTER PLANES LIST.pdfjackson110191
These fighter aircraft have uses outside of traditional combat situations. They are essential in defending India's territorial integrity, averting dangers, and delivering aid to those in need during natural calamities. Additionally, the IAF improves its interoperability and fortifies international military alliances by working together and conducting joint exercises with other air forces.
The Rise of Supernetwork Data Intensive ComputingLarry Smarr
Invited Remote Lecture to SC21
The International Conference for High Performance Computing, Networking, Storage, and Analysis
St. Louis, Missouri
November 18, 2021
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Chris Swan
Have you noticed the OpenSSF Scorecard badges on the official Dart and Flutter repos? It's Google's way of showing that they care about security. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
You can do the same for your projects, and this presentation will show you how, with an emphasis on the unique challenges that come up when working with Dart and Flutter.
The session will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across an organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Comparison Table of DiskWarrior Alternatives.pdfAndrey Yasko
To help you choose the best DiskWarrior alternative, we've compiled a comparison table summarizing the features, pros, cons, and pricing of six alternatives.
2. Recurrent Neural Network
Recurrent Neural Network
● Predicting the future is what we do all the time
○ Finishing a friend’s sentence
○ Anticipating the smell of coffee at the breakfast or
○ Catching the ball in the field
● In this chapter, we will cover RNN
○ Networks which can predict future
● Unlike all the nets we have discussed so far
○ RNN can work on sequences of arbitrary lengths
○ Rather than on fixed-sized inputs
3. Recurrent Neural Network
Recurrent Neural Network - Applications
● RNN can analyze time series data
○ Such as stock prices, and
○ Tell you when to buy or sell
4. Recurrent Neural Network
Recurrent Neural Network - Applications
● In autonomous driving systems, RNN can
○ Anticipate car trajectories and
○ Help avoid accidents
5. Recurrent Neural Network
Recurrent Neural Network - Applications
● RNN can take sentences, documents, or audio samples as input and
○ Make them extremely useful
○ For natural language processing (NLP) systems such as
■ Automatic translation
■ Speech-to-text or
■ Sentiment analysis
6. Recurrent Neural Network
Recurrent Neural Network - Applications
● RNNs’ ability to anticipate also makes them capable of surprising
creativity.
○ You can ask them to predict which are the most likely next notes in a
melody
○ Then randomly pick one of these notes and play it.
○ Then ask the net for the next most likely notes, play it, and repeat the
process again and again.
Here is an example melody produced by Google’s Magenta project
7. Recurrent Neural Network
Recurrent Neural Network
● In this chapter we will learn about
○ Fundamental concepts in RNNs
○ The main problem RNNs face
○ And the solution to the problems
○ How to implement RNNs
● Finally, we will take a look at the
○ Architecture of a machine translation system
9. Recurrent Neural Network
Recurrent Neurons
● Up to now we have mostly looked at feedforward neural networks
○ Where the activations flow only in one direction
○ From the input layer to the output layer
● RNN looks much like a feedforward neural network
○ Except it also has connections pointing backward
10. Recurrent Neural Network
Recurrent Neurons
● Let’s look at the simplest possible RNN
○ Composed of just one neuron receiving inputs
○ Producing an output, and
○ Sending that output back to itself
Input
Output
Sending output back to
itself
11. Recurrent Neural Network
Recurrent Neurons
● At each time step t (also called a frame)
○ This recurrent neuron receives the inputs x(t)
○ As well as its own output from the previous time step y(t–1)
A recurrent neuron (left), unrolled through time (right)
12. Recurrent Neural Network
Recurrent Neurons
● We can represent this tiny network against the time axis (See below
figure)
● This is called unrolling the network through time
A recurrent neuron (left), unrolled through time (right)
13. Recurrent Neural Network
Recurrent Neurons
● We can easily create a layer of recurrent neurons
● At each time step t, every neuron receives both the
○ Input vector x(t)
and
○ Output vector from the previous time step y(t–1)
A layer of recurrent neurons (left), unrolled through time(right)
14. Recurrent Neural Network
Recurrent Neurons
● Each recurrent neuron has two sets of weights
○ One for the inputs x(t)
and the
○ Other for the outputs of the previous time step, y(t–1)
● Let’s call these weight vectors wx
and wy
● Below equation represents the output of a single recurrent neuron
Output of a single recurrent neuron for a single instance
bias
ϕ() is the activation function like
ReLU
15. Recurrent Neural Network
Recurrent Neurons
● We can compute a whole layer’s output
○ In one shot for a whole mini-batch
○ Using a vectorized form of the previous equation
Outputs of a layer of recurrent neurons for all instances in a mini-batch
16. Recurrent Neural Network
Recurrent Neurons
● Y(t)
is an m x nneurons
matrix containing the
○ Layer’s outputs at time step t for each instance in the minibatch
○ m is the number of instances in the mini-batch
○ nneurons
is the number of neurons
Outputs of a layer of recurrent neurons for all instances in a mini-batch
17. Recurrent Neural Network
Recurrent Neurons
● X(t)
is an m × ninputs
matrix containing the inputs for all instances
○ ninputs
is the number of input features
Outputs of a layer of recurrent neurons for all instances in a mini-batch
18. Recurrent Neural Network
Recurrent Neurons
● Wx
is an ninputs
× nneurons
matrix containing the connection weights for the
inputs of the current time step
● Wy
is an nneurons
× nneurons
matrix containing the connection weights for
the outputs of the previous time step
Outputs of a layer of recurrent neurons for all instances in a mini-batch
19. Recurrent Neural Network
Recurrent Neurons
● The weight matrices Wx
and Wy
are often concatenated into a single
weight matrix W of shape (ninputs
+ nneurons
) × nneurons
● b is a vector of size nneurons
containing each neuron’s bias term
Outputs of a layer of recurrent neurons for all instances in a mini-batch
20. Recurrent Neural Network
Memory Cells
● Since the output of a recurrent neuron at time step t is a
○ Function of all the inputs from previous time steps
○ We can say that it has a form of memory
● A part of a neural network that
○ Preserves some state across time steps is called a memory cell
21. Recurrent Neural Network
Memory Cells
● In general a cell’s state at time step t, denoted h(t)
is a
○ Function of some inputs at that time step and
○ Its state at the previous time step h(t)
= f(h(t–1)
, x(t)
)
● Its output at time step t, denoted y(t)
is also a
○ Function of the previous state and the current inputs
22. Recurrent Neural Network
Memory Cells
● In the case of basics cells we have discussed so far
○ The output is simply equal to the state
○ But in more complex cells this is not always the case
A cell’s hidden state and its output may be different
23. Recurrent Neural Network
Input and Output Sequences
Sequence-to-sequence Network
● An RNN can simultaneously take a
○ Sequence of inputs and
○ Produce a sequence of outputs
24. Recurrent Neural Network
Input and Output Sequences
Sequence-to-sequence Network
● This type of network is useful for predicting time series
○ Such as stock prices
● We feed it the prices over the last N days and
○ It must output the prices shifted by one day into the future
○ i.e., from N – 1 days ago to tomorrow
25. Recurrent Neural Network
Input and Output Sequences
Sequence-to-vector Network
● Alternatively we could feed the network a sequence of inputs and
○ Ignore all outputs except for the last one
26. Recurrent Neural Network
Input and Output Sequences
Sequence-to-vector Network
● We can feed this network a sequence of words
○ Corresponding to a movie review and
○ The network would output a sentiment score
○ e.g., from –1 [hate] to +1 [love]
27. Recurrent Neural Network
Input and Output Sequences
Vector-to-sequence Network
● We could feed the network a single input at the first time step and
○ Zeros for all other time steps and
○ Let is output a sequence
● For example, the input could be an image and the
○ Output could be a caption for the image
28. Recurrent Neural Network
Input and Output Sequences
Encoder-Decoder
● In this network, we have
○ sequence-to-vector network, called an encoder followed by
○ vector-to-sequence network, called a decoder
29. Recurrent Neural Network
Input and Output Sequences
Encoder-Decoder
● This can be used for translating a sentence
○ From one language to another
● We feed the network sentence in one language
○ The encoder converts this sentence into single vector representation
○ Then the decoder decodes this vector into a sentence in another
language
30. Recurrent Neural Network
Input and Output Sequences
Encoder-Decoder
● This two step model works much better than
○ Trying to translate on the fly with a
○ Single sequence-to-sequence RNN
● Since the last words of a sentence can affect the
○ First words of the translation
○ So we need to wait until we know the whole sentence
32. Recurrent Neural Network
Basic RNNs in TensorFlow
● Let’s implement a very simple RNN model
○ Without using any of the TensorFlow’s RNN operations
○ To better understand what goes on under the hood
● Let’s create an RNN composed of a layer of five recurrent neurons
○ Using the tanh activation function and
○ Assume that the RNN runs over only two time steps and
○ Taking input vectors of size 3 at each time step
33. Recurrent Neural Network
Basic RNNs in TensorFlow
● This network looks like a two-layer feedforward neural network with two
differences
○ The same weights and bias terms are shared by both layers and
○ We feed inputs at each layer, and we get outputs from each layer
34. Recurrent Neural Network
Basic RNNs in TensorFlow
● To run the model, we need to feed it the inputs at both time steps
● Mini-batch contains four instances
○ Each with an input sequence composed of exactly two inputs
35. Recurrent Neural Network
Basic RNNs in TensorFlow
● At the end, Y0_val and Y1_val contain the outputs of the network
○ At both time steps for all neurons and
○ All instances in the mini-batch
37. Recurrent Neural Network
Static Unrolling Through Time
● Let’s look at how to create the same model
○ Using TensorFlow’s RNN operations
● The static_rnn() function creates
○ An unrolled RNN network by chaining cells
● The below code creates the exact same model as the previous one
>>> X0 = tf.placeholder(tf.float32, [None, n_inputs])
>>> X1 = tf.placeholder(tf.float32, [None, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, [X0, X1], dtype=tf.float32
)
>>> Y0, Y1 = output_seqs
38. Recurrent Neural Network
Static Unrolling Through Time
>>> X0 = tf.placeholder(tf.float32, [None, n_inputs])
>>> X1 = tf.placeholder(tf.float32, [None, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, [X0, X1], dtype=tf.float32
)
>>> Y0, Y1 = output_seqs
● First we create the input placeholders
39. Recurrent Neural Network
Static Unrolling Through Time
>>> X0 = tf.placeholder(tf.float32, [None, n_inputs])
>>> X1 = tf.placeholder(tf.float32, [None, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, [X0, X1], dtype=tf.float32
)
>>> Y0, Y1 = output_seqs
● Then we create a BasicRNNCell
○ It is like a factory that creates
○ Copies of the cell to build the unrolled RNN
■ One for each time step
40. Recurrent Neural Network
Static Unrolling Through Time
>>> X0 = tf.placeholder(tf.float32, [None, n_inputs])
>>> X1 = tf.placeholder(tf.float32, [None, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, [X0, X1], dtype=tf.float32
)
>>> Y0, Y1 = output_seqs
● Then we call static_rnn(), giving it the cell factory and the input
tensors
● And telling it the data type of the inputs
○ This is used to create the initial state matrix
○ Which by default is full of zeros
41. Recurrent Neural Network
Static Unrolling Through Time
>>> X0 = tf.placeholder(tf.float32, [None, n_inputs])
>>> X1 = tf.placeholder(tf.float32, [None, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, [X0, X1], dtype=tf.float32
)
>>> Y0, Y1 = output_seqs
● The static_rnn() function returns two objects
● The first is a Python list containing the output tensors for each time step
● The second is a tensor containing the final states of the network
● When we use basic cells
○ Then the final state is equal to the last output
42. Recurrent Neural Network
Static Unrolling Through Time
Checkout the complete code under “Using
static_rnn()” section in notebook
43. Recurrent Neural Network
Static Unrolling Through Time
● In the previous example, if there were 50 time steps then
○ It would not be convenient to define
○ 50 place holders and 50 output tensors
● Moreover, at execution time we would have to feed
○ Each of the 50 placeholders and manipulate the 50 outputs
● Let’s do it in a better way
44. Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● The above code takes a single input placeholder of
○ shape [None, n_steps, n_inputs]
○ Where the first dimension is the mini-batch size
45. Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● Then it extracts the list of input sequences for each time step
● X_seqs is a Python list of n_steps tensors of shape [None, n_inputs]
○ Where first dimension is the minibatch size
46. Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● To do this, we first swap the first two dimensions
○ Using the transpose() function so that the
○ Time steps are now the first dimension
47. Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● Then we extract a Python list of tensors along the first dimension
○ i.e., one tensor per time step
○ Using the unstack() function
48. Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● The next two lines are same as before
49. Recurrent Neural Network
Static Unrolling Through Time
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> X_seqs = tf.unstack(tf.transpose(X, perm=[1, 0, 2]))
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> output_seqs, states = tf.contrib.rnn.static_rnn(
basic_cell, X_seqs, dtype=tf.float32
)
>>> outputs = tf.transpose(tf.stack(output_seqs), perm=[1, 0, 2])
● Finally, we merge all the output tensors into a single tensor
○ Using the stack() function
● And then we swap the first two dimensions to get a
○ Final outputs tensor of shape [None, n_steps, n_neurons]
50. Recurrent Neural Network
Static Unrolling Through Time
● Now we can run the network by
○ Feeding it a single tensor that contains
○ All the mini-batch sequences
51. Recurrent Neural Network
Static Unrolling Through Time
● And then we get a single outputs_val tensor for
○ All instances
○ All time steps, and
○ All neurons
52. Recurrent Neural Network
Static Unrolling Through Time
Checkout the complete code under “Packing
sequences” section in notebook
53. Recurrent Neural Network
Static Unrolling Through Time
● The previous approach still builds a graph
○ Containing one cell per time step
● If there were 50 time steps, the graph would look ugly
● It is like writing a program without using for loops
○ Y0=f(0,X0); Y1=f(Y0, X1); Y2=f(Y1, X2); ...; Y50=f(Y49, X50))
● With such a large graph
○ Since it must store all tensor values during the forward pass
○ So it can use them to compute gradients during the reverse pass
○ We may get out-of-memory (OOM) errors
○ During backpropagation (in GPU cards because of limited memory)
54. Recurrent Neural Network
Dynamic Unrolling Through Time
Let’s look at the better solution than previous
approach using the dynamic_rnn() function
55. Recurrent Neural Network
Dynamic Unrolling Through Time
● The dynamic_rnn() function uses a while_loop() operation to
○ Run over the cell the appropriate number of times
● We can set swap_memory=True
○ If we want it to swap the GPU’s memory to the CPU’s
○ Memory during backpropagation to avoid out of memory errors
● It also accepts a single tensor for
○ All inputs at every time step (shape [None, n_steps, n_inputs]) and
○ It outputs a single tensor for all outputs at every time step
■ (shape [None, n_steps, n_neurons])
○ There is no need to stack, unstack, or transpose
56. Recurrent Neural Network
Dynamic Unrolling Through Time
RNN using dynamic_rnn
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> outputs, states = tf.nn.dynamic_rnn(basic_cell, X,
dtype=tf.float32)
57. Recurrent Neural Network
Dynamic Unrolling Through Time
Checkout the complete code under “Using
dynamic_rnn()” section in notebook
58. Recurrent Neural Network
Dynamic Unrolling Through Time
Note
● During backpropagation
○ The while_loop() operation does the appropriate magic
○ It stores the tensor values for each iteration during the forward pass
○ So it can use them to compute gradients during the reverse pass
59. Recurrent Neural Network
Handling Variable Length Input Sequences
● So far we have used only fixed-size input sequences
● What if the input sequences have variable lengths (e.g., like sentences)
● In this case we should set the sequence_length parameter
○ When calling the dynamic_rnn() function
○ It must be a 1D tensor indicating the length of the
○ Input sequence for each instance
60. Recurrent Neural Network
Handling Variable Length Input Sequences
● Suppose the second input sequence contains
○ Only one input instead of two
○ Then It must be padded with a zero vector
○ In order to fit in the input tensor X
61. Recurrent Neural Network
Handling Variable Length Input Sequences
● Now we need to feed values for both placeholders X and seq_length
62. Recurrent Neural Network
Handling Variable Length Input Sequences
● Now the RNN outputs zero vectors for
○ Every time step past the input sequence length
○ Look at the second instance’s output for the second time step
63. Recurrent Neural Network
Handling Variable Length Input Sequences
● Moreover, the states tensor contains the final state of each cell
○ Excluding the zero vectors
64. Recurrent Neural Network
Handling Variable Length Input Sequences
Checkout the complete code under “Setting
the sequence lengths” section in notebook
65. Recurrent Neural Network
Handling Variable-Length Output Sequences
● What if the output sequences have variable lengths
● If we know in advance what length each sequence will have
○ For example if we know that it will be the same length as the input
sequence
○ Then we can set the sequence_length parameter as discussed
● Unfortunately, in general this will not be possible
○ For example,
■ The length of a translated sentence is generally different from the
■ Length of the input sentence
66. Recurrent Neural Network
Handling Variable-Length Output Sequences
● In this case, the most common solution is to define
○ A special output called an end-of-sequence token (EOS token)
● Any output past the EOS should be ignored - We will discuss it later in
details
69. Recurrent Neural Network
Training RNNs
● To train an RNN, the trick is to unroll it through time and
then simply use regular backpropagation
● This strategy is called backpropagation through time
(BPTT)
70. Recurrent Neural Network
Training RNNs
Understanding how RNNs are trained
Just like in regular backpropagation, there is a first forward pass
through the unrolled network, represented by the dashed
arrows
71. Recurrent Neural Network
Training RNNs
Understanding how RNNs are trained
Then the output sequence is evaluated using a cost function
where tmin
and tmax
are the first
and last output time steps, not counting the ignored outputs
72. Recurrent Neural Network
Then the gradients of that cost function are propagated
backward through the unrolled network, represented by the
solid arrows
Training RNNs
Understanding how RNNs are trained
73. Recurrent Neural Network
And finally the model parameters are updated using the
gradients computed during BPTT
Training RNNs
Understanding how RNNs are trained
74. Recurrent Neural Network
Note that the gradients flow backward through all the outputs
used by the cost function, not just through the final output
Training RNNs
Understanding how RNNs are trained
75. Recurrent Neural Network
Here, the cost function is computed using the last three outputs
of the network, Y(2)
, Y(3)
, and Y(4)
, so gradients flow through
these three outputs, but not through Y(0)
and Y(1)
Training RNNs
Understanding how RNNs are trained
76. Recurrent Neural Network
Moreover, since the same parameters W and b are used at
each time step, backpropagation will do the right thing and sum
over all time steps
Training RNNs
Understanding how RNNs are trained
78. Recurrent Neural Network
Training a Sequence Classifier
● A convolutional neural network would be better suited for image
classification
● But this makes for a simple example that we are already familiar with
79. Recurrent Neural Network
Training a Sequence Classifier
Overview of the task
● We will treat each image as a sequence of 28 rows of 28 pixels each,
since each MNIST image is 28 × 28 pixels
● We will use cells of 150 recurrent neurons, plus a fully connected
layer containing 10 neurons, one per class, connected to the output of
the last time step
● This will be followed by a softmax layer
81. Recurrent Neural Network
Construction Phase
● The construction phase is quite straightforward
● It’s pretty much the same as the MNIST classifier we built previously,
except that an unrolled RNN replaces the hidden layers
● Note that the fully connected layer is connected to the states tensor,
which contains only the final state of the RNN i.e., the 28th output
Training a Sequence Classifier
82. Recurrent Neural Network
Construction Phase
>>> from tensorflow.contrib.layers import fully_connected
>>> n_steps = 28
>>> n_inputs = 28
>>> n_neurons = 150
>>> n_outputs = 10
>>> learning_rate = 0.001
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> y = tf.placeholder(tf.int32, [None])
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> outputs, states = tf.nn.dynamic_rnn(basic_cell, X,
dtype=tf.float32)
Training a Sequence Classifier
Run it on Notebook
83. Recurrent Neural Network
Construction Phase
>>> logits = tf.layers.dense(states, n_outputs, activation_fn=None)
>>> xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=y, logits=logits)
>>> loss = tf.reduce_mean(xentropy)
>>> optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
>>> training_op = optimizer.minimize(loss)
>>> correct = tf.nn.in_top_k(logits, y, 1)
>>> accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
>>> init = tf.global_variables_initializer()
Training a Sequence Classifier
Run it on Notebook
84. Recurrent Neural Network
Load the MNIST data and reshape it
Now we will load the MNIST data and reshape the test data to [batch_size,
n_steps, n_inputs] as is expected by the network
>>> from tensorflow.examples.tutorials.mnist import
input_data
>>> mnist = input_data.read_data_sets("data/mnist/")
>>> X_test = mnist.test.images.reshape((-1, n_steps,
n_inputs))
>>> y_test = mnist.test.labels
Training a Sequence Classifier
Run it on Notebook
85. Recurrent Neural Network
Training the RNN
We reshape each training batch before feeding it to the network
>>> n_epochs = 100
>>> batch_size = 150
>>> with tf.Session() as sess:
init.run()
for epoch in range(n_epochs):
for iteration in range(mnist.train.num_examples // batch_size):
X_batch, y_batch = mnist.train.next_batch(batch_size)
X_batch = X_batch.reshape((-1, n_steps, n_inputs))
sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
acc_test = accuracy.eval(feed_dict={X: X_test, y: y_test})
print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)
Training a Sequence Classifier
Run it on Notebook
86. Recurrent Neural Network
The Output
The output should look like this:
0 Train accuracy: 0.713333 Test accuracy: 0.7299
1 Train accuracy: 0.766667 Test accuracy: 0.7977
...
98 Train accuracy: 0.986667 Test accuracy: 0.9777
99 Train accuracy: 0.986667 Test accuracy: 0.9809
Training a Sequence Classifier
87. Recurrent Neural Network
Conclusion
● We get over 98% accuracy — not bad!
● Plus we would certainly get a better result by
○ Tuning the hyperparameters
○ Initializing the RNN weights using He initialization
○ Training longer
○ Or adding a bit of regularization e.g., dropout
Training a Sequence Classifier
88. Recurrent Neural Network
Training to Predict Time Series
Now, we will train an RNN to predict the next value in a
generated time series
89. Recurrent Neural Network
Training to Predict Time Series
● Each training instance is a randomly selected sequence of 20 consecutive
values from the time series
● And the target sequence is the same as the input sequence, except it is
shifted by one time step into the future
90. Recurrent Neural Network
Training to Predict Time Series
Construction Phase
● It will contain 100 recurrent neurons and we will unroll it over 20
time steps since each training instance will be 20 inputs long
● Each input will contain only one feature, the value at that time
● The targets are also sequences of 20 inputs, each containing a single
value
91. Recurrent Neural Network
Construction Phase
>>> n_steps = 20
>>> n_inputs = 1
>>> n_neurons = 100
>>> n_outputs = 1
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])
>>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons,
activation=tf.nn.relu)
>>> outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
Training to Predict Time Series
Run it on Notebook
92. Recurrent Neural Network
Construction Phase
● At each time step we now have an output vector of size 100
● But what we actually want is a single output value at each time step
● The simplest solution is to wrap the cell in an
OutputProjectionWrapper
Training to Predict Time Series
93. Recurrent Neural Network
Construction Phase
● A cell wrapper acts like a normal cell, proxying every method call to an
underlying cell, but it also adds some functionality
● The OutputProjectionWrapper adds a fully connected layer of linear
neurons i.e., without any activation function on top of each output,
but it does not affect the cell state
● All these fully connected layers share the same trainable weights and bias
terms.
Training to Predict Time Series
95. Recurrent Neural Network
Wrapping a cell is quite easy
Let’s tweak the preceding code by wrapping the BasicRNNCell into an
OutputProjectionWrapper
>>> cell = tf.contrib.rnn.OutputProjectionWrapper(
tf.contrib.rnn.BasicRNNCell(num_units=n_neurons,
activation=tf.nn.relu),output_size=n_outputs)
Training to Predict Time Series
Run it on Notebook
96. Recurrent Neural Network
Cost Function and Optimizer
● Now we will define the cost function
● We will use the Mean Squared Error (MSE)
● Next we will create an Adam optimizer, the training op, and the variable
initialization op
●
>>> learning_rate = 0.001
>>> loss = tf.reduce_mean(tf.square(outputs - y))
>>> optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
>>> training_op = optimizer.minimize(loss)
>>> init = tf.global_variables_initializer()
Training to Predict Time Series
Run it on Notebook
97. Recurrent Neural Network
Execution Phase
>>> n_iterations = 10000
>>> batch_size = 50
>>> with tf.Session() as sess:
init.run()
for iteration in range(n_iterations):
X_batch, y_batch = [...] # fetch the next training batch
sess.run(training_op, feed_dict={X: X_batch, y:y_batch})
if iteration % 100 == 0:
mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
print(iteration, "tMSE:", mse)
Training to Predict Time Series
Run it on Notebook
98. Recurrent Neural Network
Execution Phase
The program’s output should look like this
0 MSE: 379.586
100 MSE: 14.58426
200 MSE: 7.14066
300 MSE: 3.98528
400 MSE: 2.00254
[...]
Training to Predict Time Series
99. Recurrent Neural Network
Making Predictions
Once the model is trained, you can make predictions:
>>> X_new = [...] # New sequences
>>> y_pred = sess.run(outputs, feed_dict={X: X_new})
Training to Predict Time Series
100. Recurrent Neural Network
Making Predictions
Training to Predict Time Series
Shows the predicted sequence for the instances, after 1,000 training iterations
101. Recurrent Neural Network
● Although using an OutputProjectionWrapper is the simplest solution
to reduce the dimensionality of the RNN’s output sequences down to just
one value per time step per instance
● But it is not the most efficient
Training to Predict Time Series
102. Recurrent Neural Network
● There is a trickier but more efficient solution:
○ We can reshape the RNN outputs from [batch_size, n_steps,
n_neurons] to [batch_size * n_steps, n_neurons]
○ Then apply a single fully connected layer with the appropriate output
size in our case just 1, which will result in an output tensor of shape
[batch_size * n_steps, n_outputs]
○ And then reshape this tensor to [batch_size, n_steps, n_outputs]
Training to Predict Time Series
103. Recurrent Neural Network
Reshape the RNN outputs
from [batch_size, n_steps,
n_neurons] to
[batch_size * n_steps,
n_neurons]
Training to Predict Time Series
104. Recurrent Neural Network
Apply a single fully connected
layer with the appropriate
output size in our case just
1, which will result in an
output tensor of shape
[batch_size * n_steps,
n_outputs]
Training to Predict Time Series
105. Recurrent Neural Network
And then reshape this tensor
to [batch_size, n_steps,
n_outputs]
Training to Predict Time Series
106. Recurrent Neural Network
Let’s implement this solution
● We first revert to a basic cell, without the OutputProjectionWrapper
>>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons,
activation=tf.nn.relu)
>>> rnn_outputs, states = tf.nn.dynamic_rnn(cell, X,
dtype=tf.float32)
Training to Predict Time Series
Run it on Notebook
107. Recurrent Neural Network
Let’s implement this solution
● Then we stack all the outputs using the reshape() operation, apply the
fully connected linear layer without using any activation function; this is
just a projection, and finally unstack all the outputs, again using reshape()
>>> stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons])
>>> stacked_outputs = fully_connected(stacked_rnn_outputs,
n_outputs, activation_fn=None)
>>> outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])
Training to Predict Time Series
Run it on Notebook
108. Recurrent Neural Network
Let’s implement this solution
● The rest of the code is the same as earlier. This can provide a significant
speed boost since there is just one fully connected layer instead of one
per time step.
Training to Predict Time Series
110. Recurrent Neural Network
Creative RNN
● All we need is to provide it a seed sequence containing n_steps
values e.g., full of zeros
● Use the model to predict the next value
● Append this predicted value to the sequence
● Feed the last n_steps values to the model to predict the next value
● And so on
This process generates a new sequence that has some resemblance to the
original time series
111. Recurrent Neural Network
Creative RNN
>>> sequence = [0.] * n_steps
>>> for iteration in range(300):
X_batch = np.array(sequence[-n_steps:]).reshape(1, n_steps, 1)
y_pred = sess.run(outputs, feed_dict={X: X_batch})
sequence.append(y_pred[0, -1, 0])
Run it on Notebook
118. Recurrent Neural Network
● To implement a deep RNN in TensorFlow
● We can create several cells and stack them into a MultiRNNCell
● In the following code we stack three identical cells
>>> n_neurons = 100
>>> n_layers = 3
>>> basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> multi_layer_cell = tf.contrib.rnn.MultiRNNCell([basic_cell] *
n_layers)
>>> outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X,
dtype=tf.float32)
Deep RNNs - Implementation in TensorFlow
Run it on Notebook
119. Recurrent Neural Network
>>> outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X,
dtype=tf.float32)
● The states variable is a tuple containing one tensor per layer, each
representing the final state of that layer’s cell with shape [batch_size,
n_neurons]
● If you set state_is_tuple=False when creating the MultiRNNCell,
then states becomes a single tensor containing the states from every
layer, concatenated along the column axis i.e., its shape is [batch_size,
n_layers * n_neurons]
Deep RNNs - Implementation in TensorFlow
120. Recurrent Neural Network
● If you build a very deep RNN, it may end up overfitting the training set
● To prevent that, a common technique is to apply dropout
● You can simply add a dropout layer before or after the RNN as usual
● But if you also want to apply dropout between the RNN layers, you need
to use a DropoutWrapper
Deep RNNs - Applying Dropout
121. Recurrent Neural Network
● The following code applies dropout to the inputs of each layer in the
RNN, dropping each input with a 50% probability
>>> keep_prob = 0.5
>>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> cell_drop = tf.contrib.rnn.DropoutWrapper(cell,
input_keep_prob=keep_prob)
>>> multi_layer_cell = tf.contrib.rnn.MultiRNNCell([cell_drop] *
n_layers)
>>> rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X,
dtype=tf.float32)
Deep RNNs - Applying Dropout
Run it on Notebook
122. Recurrent Neural Network
● It is also possible to apply dropout to the outputs by setting
output_keep_prob
● The main problem with this code is that it will apply dropout not only
during training but also during testing, which is not what we want
● Since dropout should be applied only during training
Deep RNNs - Applying Dropout
123. Recurrent Neural Network
● Unfortunately, the DropoutWrapper does not support an is_training
placeholder
● So we must either write our own dropout wrapper class, or have two
different graphs:
○ One for training
○ And the other for testing
Let’s implement the second option
Deep RNNs - Applying Dropout
124. Recurrent Neural Network
>>> import sys
>>> is_training = (sys.argv[-1] == "train")
>>> X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
>>> y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])
>>> cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons)
>>> if is_training:
cell = tf.contrib.rnn.DropoutWrapper(cell, input_keep_prob=keep_prob)
>>> multi_layer_cell = tf.contrib.rnn.MultiRNNCell([cell] * n_layers)
>>> rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X,
dtype=tf.float32)
[...] # build the rest of the graph
>>> init = tf.global_variables_initializer()
>>> saver = tf.train.Saver()
>>> with tf.Session() as sess:
>>> if is_training:
init.run()
for iteration in range(n_iterations):
[...] # train the model
save_path = saver.save(sess, "/tmp/my_model.ckpt")
else:
saver.restore(sess, "/tmp/my_model.ckpt")
[...] # use the model Run it on Notebook
Deep RNNs - Applying Dropout
125. Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● To train an RNN on long sequences, we will need to run it over many
time steps, making the unrolled RNN a very deep network
● Just like any deep neural network it may suffer from the
vanishing/exploding gradients problem and take forever to train
Deep RNNs
126. Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● Many of the tricks we discussed to alleviate this problem can be used for
deep unrolled RNNs as well:
○ good parameter initialization,
○ nonsaturating activation functions e.g., ReLU
○ Batch Normalization,
○ Gradient Clipping,
○ And faster optimizers
Deep RNNs
127. Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● However, if the RNN needs to handle even moderately long sequences
e.g., 100 inputs, then training will still be very slow
● The simplest and most common solution to this problem is to unroll the
RNN only over a limited number of time steps during training
● This is called truncated backpropagation through time
Deep RNNs
128. Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● However, if the RNN needs to handle even moderately long sequences
e.g., 100 inputs, then training will still be very slow
● The simplest and most common solution to this problem is to unroll the
RNN only over a limited number of time steps during training
● This is called truncated backpropagation through time
Deep RNNs
129. Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● In TensorFlow you can implement truncated backpropagation
through time by simply by truncating the input sequences
● For example, in the time series prediction problem, you would simply
reduce n_steps during training
● The problem with this is that the model will not be able to learn
long-term patterns
How can we solve this problem?
Deep RNNs
130. Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● One workaround could be to make sure that these shortened sequences
contain both old and recent data
● So that the model can learn to use both
● E.g., the sequence could contain monthly data for the last five months,
then weekly data for the last five weeks, then daily data over the last five
days
● But this workaround has its limits:
○ What if fine-grained data from last year is actually useful?
○ What if there was a brief but significant event that absolutely must be
taken into account, even years later
○ E.g., the result of an election
Deep RNNs
131. Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● Besides the long training time
○ A second problem faced by long-running RNNs is the fact that the
memory of the first inputs gradually fades away
○ Indeed, due to the transformations that the data goes through when
traversing an RNN, some information is lost after each time step.
● After a while, the RNN’s state contains virtually no trace of the first
inputs
Let’s understand this with an example
Deep RNNs
132. Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● Say you want to perform sentiment analysis on a long review that starts
with the four words “I loved this movie,”
● But the rest of the review lists the many things that could have made the
movie even better
● If the RNN gradually forgets the first four words, it will completely
misinterpret the review
Deep RNNs
133. Recurrent Neural Network
The Difficulty of Training over Many Time Steps
● To solve this problem, various types of cells with long-term memory have
been introduced
● They have proved so successful that the basic cells are not much used
anymore
Let’s study about these long memory cells
Deep RNNs
135. Recurrent Neural Network
● The Long Short-Term Memory (LSTM) cell was proposed in 19973 by
Sepp Hochreiter and Jürgen Schmidhuber
● And it was gradually improved over the years by several researchers,
such as Alex Graves, Haşim Sak, Wojciech Zaremba, and many more
LSTM Cell
Sepp Hochreiter Jürgen Schmidhuber
136. Recurrent Neural Network
LSTM Cell
● If you consider the LSTM cell as a black box, it can be used very much
like a basic cell
● Except
○ It will perform much better
○ Training will converge faster
○ And it will detect long-term dependencies in the data
In TensorFlow, you can simply use a BasicLSTMCell instead of a
BasicRNNCell
>>> lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_neurons)
137. Recurrent Neural Network
LSTM Cell
● LSTM cells manage two state vectors, and for performance reasons they
are kept separate by default
● We can change this default behavior by setting state_is_tuple=False
when creating the BasicLSTMCell
139. Recurrent Neural Network
● The LSTM cell looks exactly like a regular cell, except that its state is
split in two vectors: h(t)
and c(t)
, here “c” stands for “cell”
LSTM Cell
140. Recurrent Neural Network
● We can think of h(t)
as the short-term state and c(t)
as the long-term state
LSTM Cell
141. Recurrent Neural Network
Understanding the LSTM cell structure
● The key idea is that the network can learn
○ What to store in the long-term state,
○ What to throw away,
○ And what to read from it
LSTM Cell
142. Recurrent Neural Network
As the long-term
state c(t–1)
traverses the
network from left
to right, it first
goes through a
forget gate,
dropping some
memories
Understanding the LSTM cell structure
LSTM Cell
143. Recurrent Neural Network
Understanding the LSTM cell structure
LSTM Cell
And then it adds
some new
memories via
the addition
operation, which
adds the
memories that
were selected by
an input
gate
144. Recurrent Neural Network
The result c(t)
is
sent straight out,
without any
further
transformation.
So, at each time
step, some
memories are
dropped and
some memories
are added
Understanding the LSTM cell structure
LSTM Cell
145. Recurrent Neural Network
Moreover, after
the addition
operation, the
long term state is
copied and passed
through the tanh
function, and then
the result is
filtered by the
output gate.
Understanding the LSTM cell structure
LSTM Cell
146. Recurrent Neural Network
This produces the
short-term state
h(t)
, which is
equal to the cell’s
output for this
time step y(t)
Understanding the LSTM cell structure
LSTM Cell
147. Recurrent Neural Network
This produces the
short-term state
h(t)
, which is
equal to the cell’s
output for this
time step y(t)
Understanding the LSTM cell structure
LSTM Cell
149. Recurrent Neural Network
First, the current
input vector x(t)
and the previous
short-term state
h(t–1)
are fed to
four different fully
connected layers.
They all serve a
different purpose
Understanding the LSTM cell structure
LSTM Cell
150. Recurrent Neural Network
The main layer is
the one that
outputs g(t)
. It has
the usual role of
analyzing the
current inputs x(t)
and the previous
short-term state
h(t–1)
. In an LSTM
cell this layer’s
output is partially
stored in the
long-term state.
Understanding the LSTM cell structure
LSTM Cell
151. Recurrent Neural Network
The three other
layers are gate
controllers. Since
they use the
logistic activation
function, their
outputs range
from 0 to 1.
Understanding the LSTM cell structure
LSTM Cell
152. Recurrent Neural Network
● This summarizes
how to compute
the cell’s
long-term state,
its short-term
state, and its
output at each
time step for a
single instance
● The equations for
a whole
mini-batch are
very similar
Understanding the LSTM cell structure
LSTM Cell
153. Recurrent Neural Network
Conclusion
● A LSTM cell can learn to
○ Recognize an important input, that’s the role of the input gate,
○ Store it in the long-term state,
○ Learn to preserve it for as long as it is needed, that’s the role of the
forget gate,
○ And learn to extract it whenever it is needed
This explains why they have been amazingly successful at capturing
long-term patterns in time series, long texts, audio recordings, and more.
LSTM Cell
154. Recurrent Neural Network
Peephole Connections
● In a basic LSTM cell, the gate controllers can look only at the input x(t)
and the previous short-term state h(t–1)
● It may be a good idea to give them a bit more context by letting them
peek at the long-term state as well
● This idea was proposed by Felix Gers and Jürgen Schmidhuber in
2000
155. Recurrent Neural Network
● They proposed an LSTM variant with extra connections called
peephole connections:
○ The previous long-term state c(t–1)
is added as an input to the
controllers of the forget gate and the input gate,
○ And the current long-term state c(t)
is added as input to the controller
of the output gate.
Peephole Connections
157. Recurrent Neural Network
Peephole Connections
To implement peephole connections in TensorFlow, you must use the
LSTMCell instead of the BasicLSTMCell and set use_peepholes=True:
>>> lstm_cell = tf.contrib.rnn.LSTMCell(num_units=n_neurons,
use_peepholes=True)
There are many other variants of the LSTM cell.
One particularly popular variant is the GRU cell, which we will look at now.
159. Recurrent Neural Network
GRU Cell
The Gated Recurrent Unit (GRU) cell was proposed by Kyunghyun Cho
et al. in a 2014 paper that also introduced the Encoder–Decoder network
we discussed earlier
Kyunghyun Cho
160. Recurrent Neural Network
GRU Cell
● The GRU cell is
a simplified
version of the
LSTM cell
● It seems to
perform just as
well
● This explains its
growing
popularity
161. Recurrent Neural Network
GRU Cell
The main
simplifications are:
● Both state vectors
are merged into a
single vector h(t)
162. Recurrent Neural Network
The main
simplifications are:
● A single gate
controller controls
both the forget
gate and the input
gate.
If the gate
controller outputs
a 1, the input gate
is open and the
forget gate is
closed.
GRU Cell
163. Recurrent Neural Network
The main
simplifications are:
If it outputs a 0, the
opposite happens
In other words,
whenever a memory
must be stored, the
location where it will
be stored is erased
first. This is actually a
frequent variant to
the LSTM cell in and
of itself
GRU Cell
164. Recurrent Neural Network
The main
simplifications are:
● There is no output
gate; the full state
vector is output at
every time step.
There is a new
gate controller
that controls
which part of the
previous state will
be shown to the
main layer.
GRU Cell
166. Recurrent Neural Network
Implementing GRU cell in TensorFlow
>>> gru_cell = tf.contrib.rnn.GRUCell(num_units=n_neurons)
● LSTM or GRU cells are one of the main reasons behind the success of
RNNs in recent years
● In particular for applications in natural language processing (NLP)
GRU Cell
168. Recurrent Neural Network
Natural Language Processing
● Most of the state-of-the-art NLP applications, such as
○ Machine translation,
○ Automatic summarization,
○ Parsing,
○ Sentiment analysis,
○ and more, are now based on RNNs
Now we will take a quick look at what a machine translation model looks
like.
This topic is very well covered by TensorFlow’s awesome Word2Vec and
Seq2Seq tutorials, so you should definitely check them out
169. Recurrent Neural Network
Natural Language Processing - Word Representation
Before we start, we need to answer this important question
How do we represent a “word” ??
170. Recurrent Neural Network
Natural Language Processing - Word Representation
In order to apply algorithms,
We need to convert everything in numbers.
What can we do about climate?
temp climate comments
12 Cold Very nice place to
visit in summers
30 Hot Do not visit. This
is a trap
171. Recurrent Neural Network
Natural Language Processing - Word Representation
In order to apply algorithms,
We need to convert everything in numbers.
What can we do about climate?
We can convert it into One-Hot vector
temp climate comments
12 Cold Very nice place to
visit in summers
30 Hot Do not visit. This
is a trap
temp climate_cold climate_hot comments
12 1 0 Very nice place
to visit in
summers
30 0 1 Do not visit.
This is a trap
172. Recurrent Neural Network
Natural Language Processing - Word Representation
In order to apply algorithms,
We need to convert everything in numbers.
And what can we do about comments?
temp climate comments
12 Cold Very nice place to
visit in summers
30 Hot Do not visit. This
is a trap
173. Recurrent Neural Network
One option could be to represent each word using a one-hot vector.
But consider this :
● Suppose your vocabulary contains 50,000 words
● Then the nth word would be represented as a 50,000-dimensional
vector, full of 0s except for a 1 at the nth position
● However, with such a large vocabulary, this sparse representation would
not be efficient at all
Natural Language Processing - Word Representation
174. Recurrent Neural Network
● Ideally, we want similar words to have similar representations,
making it easy for the model to generalize what it learns about a word to
all similar words
● For example,
○ If the model is told that “I drink milk” is a valid sentence, and if it
knows that “milk” is close to “water” but far from “shoes”
○ Then it will know that “I drink water” is probably a valid sentence
as well
○ While “I drink shoes” is probably not
But how can you come up with such a meaningful representation?
Natural Language Processing - Word Representation
175. Recurrent Neural Network
● The most common solution is to represent each word in the vocabulary
using a fairly small and dense vector e.g., 150 dimensions, called an
Embedding
● And just let the neural network learn a good embedding for each word
during training
Natural Language Processing - Word Embedding
176. Recurrent Neural Network
With word embeding a lot of magic is possible:
king - man + woman == queen
Natural Language Processing - Word Embedding
177. Recurrent Neural Network
from gensim.models import KeyedVectors
# load the google word2vec model
filename = 'GoogleNews-vectors-negative300.bin'
model = KeyedVectors.load_word2vec_format(filename, binary=True)
# calculate: (king - man) + woman = ?
result = model.most_similar(positive=['woman', 'king'],
negative=['man'], topn=1)
print(result)
Word Embedding - word2vec
● Based on the context of word, people have generated the vectors.
● One such vector is word2vec and other is Glove
[('queen', 0.7118192315101624)]
178. Recurrent Neural Network
Word Embedding - Vector space models (VSMs)
Based on the Distributional Hypothesis:
○ words that appear in the same contexts share semantic meaning.
Two Approaches:
1. Count-based methods (e.g. Latent Semantic Analysis)
2. Predictive methods (e.g. neural probabilistic language models)
179. Recurrent Neural Network
Word Embedding - word2vec - Approaches
1. Count-based methods (e.g. Latent Semantic Analysis)
○ Compute the statistics of how often some word co-occurs with its
neighbor words in a large text corpus
○ Map these count-statistics down to a small, dense vector for each
word
180. Recurrent Neural Network
2. Predictive models
○ Directly try to predict a word from its neighbors
○ in terms of learned small, dense embedding vectors
○ (considered parameters of the model).
Word Embedding - word2vec - Approaches
181. Recurrent Neural Network
Computationally-efficient predictive model
for learning word embeddings from raw text.
word2vec
Comes in two flavors:
1. Continuous Bag-of-Words model (CBOW)
2. Skip-Gram model
182. Recurrent Neural Network
Computationally-efficient predictive model
for learning word embeddings from raw text.
word2vec
Comes in two flavors:
1. Continuous Bag-of-Words model (CBOW)
○ predicts target words (e.g. 'mat') from source context words
○ e.g ('the cat sits on the'),
2. Skip-Gram model
183. Recurrent Neural Network
Computationally-efficient predictive model
for learning word embeddings from raw text.
word2vec
Comes in two flavors:
1. Continuous Bag-of-Words model (CBOW)
○ predicts target words (e.g. 'mat') from source context words
○ e.g ('the cat sits on the'),
2. Skip-Gram model
○ Predicts source context-words from the target words
○ Treats each context-target pair as a new observation
○ Tends to do better when we have larger datasets.
○ Will focus on this
184. Recurrent Neural Network
Neural probabilistic language models
● are traditionally trained using the maximum likelihood (ML) principle
● to maximize the probability of the next word wt
(for "target")
● given the previous words h (for "history") in terms of a softmax function,
word2vec: Scaling up Noise-Contrastive Training
185. Recurrent Neural Network
Neural probabilistic language models
● are traditionally trained using the maximum likelihood (ML) principle
● to maximize the probability of the next word wt
(for "target")
● given the previous words h (for "history") in terms of a softmax function,
word2vec: Scaling up Noise-Contrastive Training
where score(wt
, h) computes the compatibility of word wt
with the context h(a dot product is
commonly used). We train this model by maximizing its log-likelihood i.e.
186. Recurrent Neural Network
Neural probabilistic language models
● are traditionally trained using the maximum likelihood (ML) principle
● to maximize the probability of the next word wt
(for "target")
● given the previous words h (for "history") in terms of a softmax function,
word2vec: Scaling up Noise-Contrastive Training
where score(wt
, h) computes the compatibility of word wt
with the context h(a dot product is
commonly used). We train this model by maximizing its log-likelihood i.e.
This is very expensive, because we need to compute and normalize each probability using the
score for all other V words w' in the current context , at every training step.
187. Recurrent Neural Network
Neural probabilistic language models
● are traditionally trained using the maximum likelihood (ML) principle
● to maximize the probability of the next word wt
(for "target")
● given the previous words h (for "history") in terms of a softmax function,
word2vec: Scaling up Noise-Contrastive Training
This is very expensive, because we need to compute and normalize each probability using the
score for all other V words w' in the current context , at every training step.
188. Recurrent Neural Network
Instead models trained using a binary classification objective (logistic regression)
to discriminate the real target words wt
from k imaginary (noise) words w, in the same context.
word2vec: Scaling up Noise-Contrastive Training
1. Computing the loss function now scales only with the number of noise words that we select
and not all words in the vocabulary
2. This makes it much faster to train.
3. will use similar noise-contrastive estimation (NCE) loss - tf.nn.nce_loss().
189. Recurrent Neural Network
the quick brown fox jumped over the lazy dog
Word2vec: Context Example
([the, brown], quick), ([quick, fox], brown), ([brown, jumped], fox), ...
Context: word to the left and word to the right.
190. Recurrent Neural Network
the quick brown fox jumped over the lazy dog
Word2vec: Skip Gram Model
(quick, the), (quick, brown), (brown, quick), (brown, fox), ...
Task becomes to predict 'the' and 'brown' from 'quick', 'quick' and 'fox' from 'brown', etc.
Skip-gram
● inverts contexts and targets, and
● tries to predict each context word from its target word
191. Recurrent Neural Network
Natural Language Processing - Word Embedding
Let's imagine at training step t
● For first case above, the goal is to predict the from quick.
● We select num_noise number
○ of noisy (contrastive) examples
○ by drawing from some noise distribution,
○ typically the unigram distribution,
● For simplicity let's say num_noise=1 and we select sheep as a noisy
example. Next we compute the loss for this pair of observed and noisy
examples
193. Recurrent Neural Network
Natural Language Processing - Word Embedding
● The goal is to make an update to the embedding parameters
● to improve (in this case, maximize) the objective function
● We do this by deriving the gradient of the loss with respect to the
embedding parameters , i.e. (luckily TensorFlow provides easy helper
functions for doing this!).
● We then perform an update to the embeddings by taking a small step in
the direction of the gradient. When this process is repeated over the
entire training set, this has the effect of 'moving' the embedding vectors
around for each word until the model is successful at discriminating real
words from noise words.
194. Recurrent Neural Network
Natural Language Processing - Word Embedding
● At the beginning of training, embeddings are simply chosen randomly,
● But during training, backpropagation automatically moves the
embeddings around in a way that helps the neural network perform its
task
195. Recurrent Neural Network
Natural Language Processing - Word Embedding
● Typically this means that similar words will gradually cluster close to one
another, and even end up organized in a rather meaningful way.
● For example, embeddings may end up placed along various axes that
represent
○ gender,
○ singular/plural,
○ adjective/noun,
○ and so on
196. Recurrent Neural Network
Natural Language Processing - Word Embedding
How to do it in TensorFlow
In TensorFlow, we first need to create the variable representing the
embeddings for every word in our vocabulary which is initialized randomly
>>> vocabulary_size = 50000
>>> embedding_size = 150
>>> embeddings = tf.Variable(
tf.random_uniform([vocabulary_size, embedding_size],
-1.0, 1.0))
197. Recurrent Neural Network
How to do it in TensorFlow - Preprocessing
Suppose we want to feed the sentence “I drink milk” to your neural
network.
● We should first preprocess the sentence and break it into a list of
known words
● For example
○ We may remove unnecessary characters, replace unknown words by a
predefined token word such as “[UNK]”,
○ Replace numerical values by “[NUM]”,
○ Replace URLs by “[URL]”,
○ And so on
Natural Language Processing - Word Embedding
198. Recurrent Neural Network
How to do it in TensorFlow
● Once we have a list of known words, we can look up each word’s integer
identifier from 0 to 49999 in a dictionary, for example [72, 3335, 288]
● At that point, you are ready to feed these word identifiers to TensorFlow
using a placeholder, and apply the embedding_lookup() function to get
the corresponding embeddings
>>> train_inputs = tf.placeholder(tf.int32, shape=[None]) # from ids...
>>> embed = tf.nn.embedding_lookup(embeddings, train_inputs) # ...to
embeddings
Natural Language Processing - Word Embedding
199. Recurrent Neural Network
● Once our model has learned good word embeddings, it can actually be
reused fairly efficiently in any NLP application
● In fact, instead of training your own word embeddings, we may want to
download pre-trained word embeddings
● Just like when reusing pretrained layers, we can choose to
○ Freeze the pretrained embeddings
○ Or let backpropagation tweak them for your application
● The first option will speed up training, but the second may lead to slightly
higher performance
Natural Language Processing - Word Embedding
201. Recurrent Neural Network
Machine Translation
We now have almost all the tools we need to implement a
machine translation system
Let’s look at this now
202. Recurrent Neural Network
Machine Translation
An Encoder–Decoder Network for Machine Translation
Let’s take a look at a simple machine translation model that will translate
English sentences to French
203. Recurrent Neural Network
Machine Translation
An Encoder–Decoder Network for Machine Translation
A simple machine translation model
205. Recurrent Neural Network
The English
sentences are fed to
the encoder, and
the decoder
outputs the French
translations
Machine Translation
An Encoder–Decoder Network for Machine Translation
206. Recurrent Neural Network
Note that the
French translations
are also used as
inputs to the
decoder, but
pushed back by one
step
Machine Translation
An Encoder–Decoder Network for Machine Translation
207. Recurrent Neural Network
Machine Translation
An Encoder–Decoder Network for Machine Translation
In other words, the
decoder is given as input
the word that it should
have output at the
previous step.
Regardless of
what it actually output at
the current step
208. Recurrent Neural Network
For the very first word,
the decoder is given a
token that represents the
beginning of the sentence
(here, “<go>”)
The decoder is expected
to end the sentence with
an end-of-sequence (EOS)
token (here, “<eos>”)
Machine Translation
An Encoder–Decoder Network for Machine Translation
209. Recurrent Neural Network
Question:
Why are the English
sentences reversed before
feeding it to the encoder??
Here “I drink
milk” is reversed to
“milk drink I”
Machine Translation
An Encoder–Decoder Network for Machine Translation
210. Recurrent Neural Network
Answer:
This ensures that the
beginning of the English
sentence will be fed last
to the encoder, which is
useful because that’s
generally the first thing
that the decoder needs to
translate
Machine Translation
An Encoder–Decoder Network for Machine Translation
211. Recurrent Neural Network
● Each word is initially
represented by a
simple integer identifier
● e.g., 288 for the word
“milk”
Machine Translation
An Encoder–Decoder Network for Machine Translation
212. Recurrent Neural Network
● Next, an embedding
lookup returns the
word embedding
● This is a dense, fairly
low-dimensional vector
● These word
embeddings are what is
actually fed to the
encoder and the
decoder
Machine Translation
An Encoder–Decoder Network for Machine Translation
213. Recurrent Neural Network
● At each step, the
decoder outputs a
score for each word in
the output vocabulary
i.e., French,
Machine Translation
An Encoder–Decoder Network for Machine Translation
214. Recurrent Neural Network
● And then the Softmax
layer turns these
scores into
probabilities
Machine Translation
An Encoder–Decoder Network for Machine Translation
215. Recurrent Neural Network
● For example, at the
first step the word “Je”
may have a probability
of 20%, “Tu” may have
a probability of 1%, and
so on
● The word with the
highest probability is
output
Machine Translation
An Encoder–Decoder Network for Machine Translation
216. Recurrent Neural Network
How can we use this Encoder–Decoder Network for Machine Translation
at the inference time, since we will not have the target sentence to feed to
the decoder ??
Machine Translation
An Encoder–Decoder Network for Machine Translation
217. Recurrent Neural Network
● We will simply feed the decoder the word that it output at the previous
step
● This will require an embedding lookup that is not shown on the diagram
Machine Translation
An Encoder–Decoder Network for Machine Translation