Getty Images/iStockphoto

How to test a predictive model

Strategies for testing predictive models and analytics emphasize data quality, real-time testing and code redundancy, as well as AI and machine learning integration.

Matt Heusser, Excelon Development

Published: 29 Sep 2023

One of the simplest, most straightforward forms of AI is the predictive model. The predictive model, which uses the same kind of logic that powers large language models, such as GPT-4, might already be in use in your organization. It might be used to predict demand for retail sales, for fraud detection, for dynamic pricing strategies or to recommend products to customers.

Writing code to recommend products could be as easy as joining a few tables in a database: find what people who bought that product also bought. Fundamentally, that's predicting. Predicting the future might seem impossible, but it can be done and done well -- or badly. A bad prediction might mean some retail products are overstocked and other shelves go empty. Organizations test to reduce that risk, yet testing a predictive model brings other challenges.

Predictive model testing strategies

Here are a few strategies to test predictive models and analytics, some context, and ideas about how to use AI and machine learning (ML) to help test.

Create an obvious data set

For e-commerce recommendations, you can make some obvious conclusions. Have a lot of users select two or three other products; maybe they buy a fourth but give it one star. This kind of database can push the requirements with examples. Ideally, you are able to save this database as is and then import it later to a clean system for testing -- especially if page views are part of the algorithm. The obvious data set makes obvious conclusions.

Test the input data

Any serious review of ERP system implementation shows that predictive algorithms might fail based on design but will fail with bad data. Worse, the meaning of the data migrated from one system to another might be different, even if the columns have the same names. Don't simply cut over the data; follow the workflow to see why that data is entered and what it means.

Get live data and run the algorithm

In some cases, you can run the predictive model in real time and compare it to what's happening. Some companies have a shadow tool that enables you to log in as an arbitrary user and see what they see. So, you could find the most active and unique users, look at their history and see if what they see makes sense. Note that, in many industries, personally identifiable information must be anonymized, but that might not matter for testing.

For example, one company I worked with was using analytics to predict when a driver would start a vehicle. It might be that the driver would enter the car somewhere between 5:25 p.m. and 5:45 p.m., if the GPS said the vehicle was parked at work on a weekday. If the conditions were right, the vehicle would check the temperature, start the engine and either turn on the heat or air conditioner. One easy way to test this is to drive a luxury vehicle under development -- or get data from a vehicle and run the algorithm.

Back-test

ERP systems can use ML to predict next year's holiday season based on last year, plus year-over-year growth in the first six months. Use the same math to generate last holiday season compared to the previous year, and compare that to reality. This applies to all kinds of inventory preparation. You can do the same thing with predicting products or movies people should like. Split their activity in half, and then look at what they should like compared to actual product reviews.

Check the output

The model might be correct, but it's always possible there is an error in the output or the extract, transform and load program. That kind of error could result in a downstream program having the wrong information, leading to a bad recommendation.

Code it twice

Although writing the program twice might not be cost-effective, isolating the code that generates the predictions and writing that twice could be surprisingly quick. Having two predictive algorithms on the same data makes it possible to run the program twice and compare output.

The movie Minority Report is based on the idea of comparing three different predictive algorithms, with a requirement that at least two agree. Two different predictive algorithms can find errors interpreting the specification and straight-up coding errors.

Check the model

As Mark Twain said, there are three kinds of deception: lies, damned lies and statistics. There are, for example, entire webpages on how to get a spreadsheet to generate "hockey stick" predictive growth based on any data. Understand how the formula works and why that one was chosen.

Using predictive models in testing

Karen Johnson's RCRCRC heuristic points us to look at changes that are recent, core, risky, configuration-sensitive, recently repaired and chronic. Historically, that was mostly done by an educated team at a whiteboard or in a mind-map tool. Today, with the help of AI and large language models, it's possible to gather those elements as discrete data points to predict what to test. That might involve log data, version control commits, analysis of test coverage, defect tracking and Jira. The simplest part of that could be core -- look at log data, reduce it, assign a ranking and then sort. The features that are used most often are the most important to customers.

Another option is to tie a tool that predicts customer behavior to automated checks -- to run the predicted customer behavior through the software as a kind of test.

Most organizations don't have that wherewithal. For many it's not even in the realm of possibility, even if the data were in place, which it isn't. Still, small strides in that direction could generate test ideas, test data or better evaluations.