The Future of AI Relies on Testing: Why 2024 Marks a Key Moment for QA Engineers

Testify
7 min readApr 8, 2024

--

In a significant case with Air Canada, a passenger was misled by an AI chatbot about special ticket prices for those who have recently lost a family member. The chatbot said the passenger could ask for a discount even after buying the ticket, but this wasn’t true according to Air Canada’s actual rules. When the passenger found out the truth, they took the airline to court. The court sided with the passenger, ordering Air Canada to pay them a little over $800 for the mistake and the trouble it caused.

Furthermore, Google’s Gemini AI faced significant backlash due to its image generation tool producing historically inaccurate and biased content. The tool generated images that misrepresented historical figures and scenarios, such as showing Nazis and America’s Founding Fathers as people of color. Sergey Brin, co-founder of Google admitted that this was “mostly due to not thorough testing”.

These two incidents are just the tip of the iceberg, and we can expect even more in the future if the importance of properly testing AI systems is not recognized. While there’s a lot of buzz about creating AI tools, not enough attention is being paid to their quality. This gap presents a significant chance for QA engineers to shine by focusing on the quality of these tools and applications. Learning about AI’s underlying technology not only enhances our skill set but also positions us to ensure these innovations truly meet quality standards.

While 2023 was all about creating AI, 2024 could be the year for thoroughly testing and improving these tools.

The Growing Complexity of AI Systems

AI algorithms and models are no longer simple decision trees or basic neural networks; they have evolved into intricate systems capable of learning and adapting in ways that try to mimic human intelligence. This leap in complexity is not without its challenges. As these models delve deeper into layers of data and algorithms, the potential for errors — subtle or significant — increases exponentially. The complexity of these systems makes it difficult not only to predict outcomes but also to diagnose and correct issues when they arise.

This is where robust Quality Assurance practices come into play. In traditional software development, QA is a critical step in ensuring that the software performs as good as possible. In the world of AI, QA practices must evolve beyond traditional methods to address the unique challenges presented by complex AI systems. These practices are not just about finding bugs; they’re about understanding the intricacies of AI behavior, ensuring that AI decisions are explainable, and validating that outcomes are ethically and socially responsible.

The Evolution of QA in AI Development

AI systems, by nature, learn and adapt, leading to outcomes that are not always predictable. This unpredictability necessitates a shift in QA methodologies, where testing must account for the AI’s learning ability and ensure that it makes decisions based on the right patterns learned from data.

To address these unique challenges, the industry has seen the introduction of new testing frameworks specifically designed for AI systems. These frameworks go beyond traditional testing methods to include techniques like data set evaluation, model behavior testing under unexpected conditions, and continuous monitoring of AI systems post-deployment to ensure they adapt correctly over time. Tools and platforms that facilitate such comprehensive testing are becoming essential components of the AI development lifecycle.

With the advent of these new testing frameworks and methodologies, there has also been a significant emergence of AI-specific QA roles and skills within the industry. Professionals in this field now need a blend of skills, including a deep understanding of AI technologies, data science, ethical considerations, and even domain-specific knowledge. The role of an AI QA engineer has become more interdisciplinary, bridging the gap between traditional QA practices and the cutting-edge needs of AI system testing.

Key Challenges in AI Testing

At the core of AI testing challenges lie three critical issues: data bias, model transparency, and the unpredictability of AI behavior. Data bias can skew AI decision-making, leading to outcomes that are unfair or not representative of the intended user base. This issue stems from the data the models are trained on; if the data is not diverse or is skewed in any way, the AI’s decisions will reflect these biases. Model transparency, or the lack thereof, complicates this further. AI systems, especially those based on deep learning, are often seen as “black boxes” where the decision-making process is not easily understandable by humans. This opacity makes it difficult to diagnose and correct errors in AI behavior. Lastly, the unpredictability of AI, where systems may evolve in unexpected ways as they learn from new data, poses a significant challenge to ensuring consistent performance.

Creating realistic test scenarios for AI systems is another difficult task. In the real world, AI systems encounter a vast range of inputs and situations, many of which are difficult to predict and replicate in a testing environment. This “realism gap” can lead to AI systems that perform well in testing but falter in real-world applications, where unexpected inputs or scenarios can lead to errors or inappropriate responses.

The fast-paced evolution of AI technology adds another layer of complexity to QA practices: the challenge of continuous testing and integration. AI systems are not static; they learn and adapt over time, which means they need to be continuously tested to ensure they are making decisions correctly. This continuous testing must be integrated seamlessly into the development process without slowing down innovation. Balancing the need for rigorous QA with the pace of AI development requires innovative approaches to testing and integration, ensuring that AI systems can be updated and improved without compromising on quality.

The Role of Human Oversight

Despite the strides made in AI capabilities, the nuanced judgment and ethical considerations humans bring to the table remain irreplaceable in my opinion. At the heart of AI QA processes, human judgment serves as the cornerstone for ensuring that AI systems operate within ethical boundaries and make decisions that align with human values. Humans possess the unique ability to understand context, make ethical judgments, and foresee consequences in ways that AI currently cannot. This becomes especially crucial in scenarios where AI decisions have significant ethical implications, such as in healthcare or criminal justice. Arguing for the irreplaceable role of human oversight is not to undermine the capabilities of AI but to highlight the complexity and ethical considerations that underpin many AI applications.

The collaboration between humans and AI in testing and QA roles exemplifies how combining these distinct approaches can lead to more reliable and effective outcomes. Humans can provide the contextual understanding and ethical framework necessary for setting up realistic test scenarios and interpreting results. In contrast, AI can handle the volume and speed required for extensive testing, especially in continuous integration/continuous deployment (CI/CD) environments. This partnership allows for a comprehensive QA process that can quickly adapt to new data and evolving AI behaviors, ensuring that AI systems remain aligned with intended outcomes and ethical standards.

Preparing for the Future: QA in AI Development

The rapid evolution of AI technologies in 2023 and the anticipated advancements in 2024, underscores the importance of rigorous testing and ethical oversight in AI development. Incidents involving Air Canada and Google’s Gemini AI highlight the consequences of neglecting thorough testing. These examples not only illustrate the pitfalls of rapid AI advancement without proper oversight but also signal the critical need for QA engineers to prioritize the integrity, reliability, and ethical standards of AI tools and systems.

Looking ahead, the challenge for the AI industry is to embrace and invest in enhanced QA practices and ethical frameworks. By doing so, organizations can guarantee that their AI systems not only push the boundaries of what’s technologically possible but also align with societal values. This commitment to quality and integrity in AI will pave the way for technologies that not only innovate but also enrich lives and contribute positively to society.

If you’re interested in more content about Software Testing, QA, Artificial Intelligence and Digital Health, be sure to follow.

Website: https://www.therealtestify.com/

Twitter: https://twitter.com/TheRealTestify

Instagram: https://www.instagram.com/therealtestify/

YouTube: https://www.youtube.com/@therealtestify

--

--

Testify

Test Consultant / Senior Software Test Engineer 👨‍💻🕵️ 🐞🛠️ Passionate about Software Quality, AI and Digital Health