Henry
Henry Creator of pitest

History Patterns for CI

History Patterns for CI

We’re very pleased with Arcmutate’s new incremental analysis feature. It opens up new patterns of working which are not possible with pitest’s built in incremental analysis. This post will look at the different ways it can be used on a CI server, and discuss the pros and cons of each approach. A future post will look at how it can be used as part of local development.

All patterns require history files to be stored and somehow associated with the different branches. We are aware of customers using various approaches including file shares, S3 buckets, and publishing artifacts to nexus. Our earlier blog post shows how the GithHub Actions cache can be used.

If coverage metrics are not required for other purposes, Arcmutate’s run_tests parameter should be set to false for all the patterns below to increases the efficiency of the analysis on the PR branches.

Super Fast PRs with Git Integration

Combining incremental analysis with git integration can further speedup mutation testing. Git integration on its own ensures that only the changed parts of the codebase are analysed within a branch. When incremental analysis is added, it further ensures that each change is only analysed once.

There are several variations on this pattern, with different pros and cons.

Analyse Only Pull Requests

diagram showing history files being shared

  • In each PR branch start with no history file
  • Run pitest against the incoming diff on the first push to CI
  • Reuse the branch history file for each subsequent push, storing the output for the next one

This is simple to setup, and more efficient than using Git integration alone. A full analysis is never required, so the technique can be applied to codebases of any size.

The history files generated are “partial” (i.e. they only contain information about the mutants generated from the diffs on the branch). As the files are abandoned when the branch is merged, these mutants will need to be re-analysed from scratch if the same code is modified in a new branch in the future.

Standard caveats about inaccuracies introduced by using incremental analysis apply.

Analyse Pull Requests and Main

diagram showing history files being shared

  • Run a full analysis on main and store the history file
  • On each PR branch, start with the history file from main and analyse only changes using Git integration
  • Store the output history file against the PR branch and reuse from that point onwards

All history files will be “complete”, so this is more efficient than analysing PRs only. With the PR only approach any commits on a PR branch that touch different code than the previous commits will trigger a non-history accelerated analysis of the modified lines. With complete history files, all changes to the code will be accelerated.

The major downside of this approach is that it requires a full analysis to be run at least once against the project. For large projects this may not be practical.

Ratcheting CI

If you have a codebase where running a complete analysis is impractical on even an occasional basis, Arcmutate lets you use git integration to build up a history file gradually over time. It may never become a “complete” file, but it will contain history information for all portions of the codebase that are modified regularly. This pattern is not possible using pitest’s built in incremental analysis.

It works like this.

diagram showing history files being shared

  • Start with an empty history file on the main branch
  • In each pull request branch analyse only the changed files, using the current history file from main as a starting point
  • When the branch merges in, re-analyse just the incoming changes using the main history file as an input and output

In teams where there are not many concurrently open pull requests, the history file from the PR branch could be used to simply overwrite the one on main when merging. This would however result in some lost history information if a branch were later merged which had been forked before these changes were accepted.

Conclusion

The best way of working will vary from team to team. If your codebase is small enough, it is probably best to stick to using git integration alone, but for larger projects or codebases with slower test suites, adding history files can make a big difference to feedback times.

Teams without access to the git integration available with Pro can still use Arcmutate’s history implementation on CI with a scheme similar to the second one shown. They will need to devise their own mechanism for displaying the results however.

Thanks for reading. Checkout our industrial quality mutation testing tools for the jvm.