Browse Definitions :
Choosing between a rule-based vs. machine learning system natural language understanding (NLU)
X
Definition

validation set

What is a validation set?

A validation set is a set of data used to train artificial intelligence (AI) with the goal of finding and optimizing the best model to solve a given problem. Validation sets are also known as dev sets.

Supervised learning and machine learning models are trained on very large sets of labeled data, in which validation data sets play an important role in their creation.

Training, tuning, model selection and testing are performed with three different sets of data: train, test and validation. Validation sets are used to select and tune the AI model.

Validation data sets use a sample of data that is withheld from training. That data is then used to evaluate any apparent errors. Machine learning engineers can then tune the model's hyperparameters -- which are adjustable parameters used to control the behavior of the model. This process acts as an independent data set for comparing the model's performance.

Even though validation data sets use training data for testing, it is not a part of either training or testing processes. This process acts as an unbiased evaluation of a model.

What are the differences between train, validation and test data sets?

Validation data sets are an important part of AI, machine learning and deep learning models, along with training and test data sets. These models use these data sets to identify and learn from data such as text images. After training, the models can be applied to areas such as text and image generation, natural language understanding or in the medical field. Testing, training and validation data sets are all used to prepare the model for operation, but are used at different points in its development:

How a data set for ML is separated.
The complete labeled data set is separated into an initial training set and then in smaller portions, validation and test data sets.
  • The training set is the portion of data used to train models. The model learns from this data. In testing, the models are fit to parameters in a process that is known as adjusting weights. Training makes up most of the total data.
  • Testing sets are only used when the final model is completely trained. These sets contain ideal data that extends to different scenarios the model would face in operation. This ideal set is used to test results and assess the performance of the final model.
  • The validation set uses a subset of the training data to provide an unbiased evaluation of a model. The validation data set contrasts with training and test sets in that it is an intermediate phase used for choosing the best model and optimizing it. It is in this phase that hyperparameter tuning occurs. Overfitting is checked and avoided in the validation set to eliminate errors that can be caused for future predictions and observations if an analysis corresponds too precisely to a specific data set.

Model training with training, validation and test sets should be split depending on the number of data samples and the model being trained. Different models might require significantly more data to train than others. Likewise, the more hyperparameters there are, the larger the validation split needs to be. It is also generally considered unwise to attempt further adjustment past the testing phase. Attempting to add further optimization outside the validation phase will likely increase overfitting.

Learn more methods to evaluate and improve machine learning models.

This was last updated in September 2023

Continue Reading About validation set

Networking
  • subnet (subnetwork)

    A subnet, or subnetwork, is a segmented piece of a larger network. More specifically, subnets are a logical partition of an IP ...

  • Transmission Control Protocol (TCP)

    Transmission Control Protocol (TCP) is a standard protocol on the internet that ensures the reliable transmission of data between...

  • secure access service edge (SASE)

    Secure access service edge (SASE), pronounced sassy, is a cloud architecture model that bundles together network and cloud-native...

Security
  • cyber attack

    A cyber attack is any malicious attempt to gain unauthorized access to a computer, computing system or computer network with the ...

  • digital signature

    A digital signature is a mathematical technique used to validate the authenticity and integrity of a digital document, message or...

  • What is security information and event management (SIEM)?

    Security information and event management (SIEM) is an approach to security management that combines security information ...

CIO
  • product development (new product development)

    Product development -- also called new product management -- is a series of steps that includes the conceptualization, design, ...

  • innovation culture

    Innovation culture is the work environment that leaders cultivate to nurture unorthodox thinking and its application.

  • technology addiction

    Technology addiction is an impulse control disorder that involves the obsessive use of mobile devices, the internet or video ...

HRSoftware
  • organizational network analysis (ONA)

    Organizational network analysis (ONA) is a quantitative method for modeling and analyzing how communications, information, ...

  • HireVue

    HireVue is an enterprise video interviewing technology provider of a platform that lets recruiters and hiring managers screen ...

  • Human Resource Certification Institute (HRCI)

    Human Resource Certification Institute (HRCI) is a U.S.-based credentialing organization offering certifications to HR ...

Customer Experience
  • contact center agent (call center agent)

    A contact center agent is a person who handles incoming or outgoing customer communications for an organization.

  • contact center management

    Contact center management is the process of overseeing contact center operations with the goal of providing an outstanding ...

  • digital marketing

    Digital marketing is the promotion and marketing of goods and services to consumers through digital channels and electronic ...

Close