From the course: GPT-4: The New GPT Release and What You Need to Know

What are Large Language Models and GPT?

From the course: GPT-4: The New GPT Release and What You Need to Know

What are Large Language Models and GPT?

- [Announcer] Over the last few months, GPT and ChatGPT have been popular buzzwords, but things kicked off long before in natural language processing. A type of AI model architecture called Transformers was proposed by a team of researchers from Google in 2017 in a paper called "Attention is All You Need." All large language models use components of Transformers as part of their architecture. What's remarkable is that you can interact with these models with plain English text, called a prompt. For example, you can say something like, "Summarize the following text," and provide the text, and the large language model will respond with a summarized version of the text you gave it. Now, sometimes you won't get the output you expect, so you can change the prompt, and you might get another, and hopefully better, answer. The language model is made up of parameters, or weights. Initially, these parameters have random values, and if you were to prompt it then, it would return just gibberish. But if you train the model, passing it a large corpus of data like the entire English Wikipedia, and Common Crawl which is a web crawled data, and some other data sources, it adjusts these parameters as part of the training process. Now, when you prompt it, it doesn't return gibberish, but starts providing output similar to the data it was trained on. Now, just to be clear, the language model isn't thinking, it's just returning text similar to the patterns it has picked up during training. These language models are usually called large language models because they have millions, and usually billions, of parameters. GPT was the first well-known large language model. It doesn't work with words. Instead, it works with parts of words, known as tokens, which are around four characters Long. G stands for generative, as we're predicting a future token, given past tokens. P is for pre-trained, as it's trained on a large corpus of data, including English Wikipedia among several others. This training involves significant compute time and costs. And finally, the T corresponds with the fact that we are using a portion of the Transformer architecture. GPT-3's objective was simple. Given the preceding tokens, it needed to predict the next ones. This is known as a causal, or autoregressive, language model. The way this works is very similar to how predictive text works on your phone. For example, if you type roses, the next word is likely to be are, followed by red, and so on. Alright, we've looked at what large language models are, and how you can interact with them by providing a prompt as input. They're trained on large amounts of data which is why they return intelligent-sounding text. The latest version of GPT, GPT-4, was released in March, 2023, and that's what we'll look at next.

Contents