What Is Supervised Learning?

Supervised learning uses labeled sets of data to train AI

Supervised learning is a type of machine learning that uses labeled sets of data to train artificial intelligence (AI). Here's what supervised learning is all about, how it works, and its applications.

How Does Supervised Learning Work?

In supervised learning, an AI algorithm is fed training data (inputs) with clear labels (outputs). Based on the training set, the AI learns how to label future inputs of unlabeled data. Ideally, the algorithm will improve its accuracy as it learns from past experiences.

If you wanted to train an AI algorithm to classify shapes, you would show it examples of accurately labeled shapes along with instructions explaining the reasoning behind each label (for example, “a shape that has three sides is a triangle” or “a shape with four sides is a square.”)

Once you've provided the training data, you would test the algorithm by showing it shapes without labels. The AI will then use its knowledge from the training set to assign the appropriate labels/outputs to each shape.

Applications of Supervised Learning

Supervised learning is used to train AI algorithms to perform many tasks, including:

  • Image recognition: Identifying objects or people in videos and images.
  • Speech recognition: Matching voices with people and translating audio to text.
  • Handwriting recognition: Translating handwritten letters into digital text.
  • Spam detection: Filtering email and text messages that look like spam.
  • Fraud detection: Flagging abnormal financial transactions.
  • Geographic mapping: Classifying land formations based on satellite images.
  • News categorization: Sorting news stories based on topic or region.
  • Marketing: Targeting ads based on user demographics (age, location, etc.)
  • Predictive analytics: Making financial decisions based on desired outputs.

Limitations of Supervised Learning

To adequately train a supervised learning algorithm, you need a lot of accurately labeled data. The training data set must also be diverse enough for the algorithm to identify slight pattern variances.

One of the benefits of supervised learning is that it can be highly accurate, but high accuracy isn't always good. That's because it could indicate overfitting, which is when the training and test data are too similar. When you test the algorithm, the test data should be different enough from the training set to ensure it will work in real-world settings.

Supervised vs. Unsupervised Learning

The training data is unlabeled in unsupervised learning, so the AI must identify patterns and create its own labels. In semi-supervised learning, part of the input data is already labeled.

Supervised learning can be time-consuming since it requires a human, or supervisor, to label all the data in the training set. A supervisor must also test the algorithm for accuracy. This introduces the possibility of human error, so the person labeling the training data must be a data expert.

Unlike unsupervised learning, supervised learning algorithms can't classify data independently. So, if a supervised learning algorithm trained to identify triangles and squares is presented with a hexagon, it wouldn't be able to label it. If it were an unsupervised algorithm, it would identify the hexagon as neither a triangle nor a square and create a new category.

Types of Supervised Learning Algorithms

Supervised learning algorithms can be divided into two types:

  • Classification: In classification algorithms, the output is a category. These algorithms are ideal for binary classifications, such as deciding whether or not an email is spam, but they can be used for more complicated sorting, such as organizing a list of drugs by class.
  • Regression: In regression algorithms, the output is a numerical value. These algorithms make predictions, such as guessing real estate values based on zip code or forecasting the temperature based on the time of day.

Within these two categories are several popular supervised learning algorithms like linear regression, logistic regression, and naive Bayes classifiers. Some algorithms, such as support vector machines (SVM) and random forests, combine elements of classification and regression.

Supervised learning algorithms can be combined with neural networks to reassess their own outputs and fine-tune themselves.

FAQ
  • What is self-supervised learning?

    Self-supervised learning is similar to supervised learning in that an algorithm uses past examples to identify new data. The difference is that in self-supervised learning, humans don't provide labels. It's also distinct from unsupervised learning, however, in that later stages of a self-supervised training program can include some supervised tasks.

  • When do you use supervised vs. unsupervised learning?

    Supervised learning is most useful when you have objects that you definitely want to train the program to identify. For example, autonomous car programmers really want vehicles to know a stop sign when they see one. Unsupervised learning's application is more for building understanding of a particular field (e.g., physics).

Was this page helpful?