Henry
Henry Creator of pitest

Mutation Testing for AIs

Mutation Testing for AIs

Fast Local Loops For Machines and Humans

Lots of people are now noticing that mutation testing is a wonderful thing in a world where code is generated by large language models. It ensures the code is tested, and it helps fight code bloat by identifying redundant code. And, if you use the right tools, its fast, deterministic and cheap - no tokens need be consumed.

The main focus of our tooling has been around CI - reducing the cost and effort of maintaining code and test quality by putting information in the hands of developers working on pull requests. But we’ve had an increasing number of enquiries from customers wanting to integrate mutation testing directly with their chosen AI tools.

After some experimentation we think we’ve come up with a good answer.

What an AI wants is the same thing that a human wants - a fast local loop that gives actionable information without as little noise as possible.

We’d already built that, so we’ve made a few tweaks to package it up for easy consumption.

The Agent Plugin

The agent plugin can be added to maven and gradle builds that have pitest setup like any other pitest plugin.

It requires pitest 1.25.3 or above.

Once installed, you can run it in two modes

  • Recent
  • Gradual

It’s best run with verbosity set to SILENT and the maven -q flag.

Recent

1
mvn -q -Dverbosity=SILENT -DextraFeatures="+recent,+agent" test-compile org.pitest:pitest-maven:mutationCoverage

Will analyse the lines of code touched by un-comitted changes (or by the lines touched in the last commit if there is nothing to commit).

It will output the most interesting mutants directly to the console e.g.

1
2
3
4
5
6
7
8
9
10
11
12
13
Priority of mutants is shown in []. Mutant [0] is the most important.

src/main/java/org/example/AnExampleClass.java

 16: public class AnExampleClass {
 17: 
 18:     public String doThings(int i) {
 19:         System.out.println("Mutate me!"); <-- [0] removed call to java/io/PrintStream::println SURVIVED

Tests classes covering these mutants 

* src/test/java/org/example/AnExampleClassTest.java
* src/test/java/org/example/RoundTripTest.java

More mutants may have been analysed than the ones shown, but only the most interesting *results are reported.

Gradual

If you’re not currently working on a change, but instead want to make improvements to an existing codebase, gradual mode comes into play.

1
mvn -q -Dverbosity=SILENT -DextraFeatures="+gradual,+agent" test-compile org.pitest:pitest-maven:mutationCoverage

This works in exactly the same way, surfacing the most interesting mutants, but now they are selected from the entire codebase.

While it might be possible to mutate the whole codebase if it is small, gradual mode assumes that doing so would be too expensive. Instead, new mutants are analysed each time its run, maintaining an incremental analysis file to ensure we never do the same work twice. If you run it enough times, every possible mutant will eventually be analysed following priority order.

The mutants that are reported to the console will most likely be the same ones each time. They’ll only stop being shown when they become uninteresting because:

  • A test has been written to kill them
  • The code has been refactored to make them go away
  • They’ve been marked as equivalent

Equivalent Mutants

The agent plugin introduces a new way to deal with equivalent mutants. Arcmutate’s approach has always been to try to avoid producing them through a mix of heuristics and mechanisms such as exclusions.

To support the agent plugin we’ve introduced a new approach. There is now an EQUIVALENT status that can be applied to a mutant by editing the history file.

And the great thing is that you can get your chosen language model to do this for you by generating a small patch file that’s applied to the history file on the next run.

Claude Plugin

We’ve wrapped all this up in a claude skill. At the moment it only supports single module maven plugins, but as all the heavy lifting is done by the agent plugin, it should be easy to remix things to support gradle and more complex projects.

Thanks for reading. Checkout our industrial quality mutation testing tools for the jvm.