build-o-tron

i don’t particularly trust that GitHub will be where i want my authoritative repos to exist, and i trust free CI on someone else’s infrastructure even less. sr.ht exists, and forgeperf.org is compelling! but i have infrastructure at home and i’d like to get precise (ish) perf measurements when validating changes to sensitive code like yaxpeax-x86. and hey, what is a CI system but a miserable pile of bash scripts anyway?

a wishlist

if i’m building a fit-to-purpose CI system, what is it that i’d like to fit to? first, conceptual orientation: CI systems are typically oriented around “builds” and “test runs”, or perhaps a more general “run”. i’m actually more interested in validation of some combination of code and data - initial steps might be the same build-and-test as anywhere else, but a holistic validation of a change would include notes about later changes.

as an example, i would find it unsatisfying to claim a library’s CI is “passing” when it turns out a downstream dependent is broken by a change. in this kind of circumstance, i’d want to associate a later issue with the change just the same, and potentially change the way validation is done to avoid that issue in the future. but use of a library is in some part validation that the library is fit for some purpose!

validation can be done by arbitrary machines

as shown on GitHub, the ✔️ as shown on commit statuses serves as an attestation that some validation is okay with a revision of some code and data. i’m not pressed about who provides that attestation, except that it should come from computers i trust to faithfully run all relevant validation.

in GitHub terms, this is the same as noting that i could run self-hosted Runners. and sure, nothing’s stopping me from running a Jenkins agent on any old computer or VM to do builds and validation.

perhaps a more interesting turn here would be to say that you trust my changes are ones you would want to pull and validate, or that i trust that your runner is able to faithfully validate my changes.

validation is not one-and-done

build-and-test cycles are typically faster; some number of minutes, maybe hours, and you find out if something works. some forms of validation can be more expensive, like comprehensive fuzzing, or “translate-to-SMT-and-validate”. further, the confidence-for-time-spent for validation of a change can vary wildly depending on what is being validated. ten minutes for build-and-test may catch many issues, while ten minutes of fuzzing is somewhat less interesting.

ideally, a CI system should be able to prioritize resource-constrained jobs and not waste time re-running fuzz jobs for a trivial documentation change. conversely, the model for a validation should be able to express expected future validations that may not even have been planned yet. it should be able to indicate to users (like me!) that even if there is no current ongoing work related to a change, there may be future work.

combined with the above, some validation may, ideally, be done on many different computers, architectures, operating systems, etc. this is not true in all cases! but the system should support this well.

first-class metrics

as i’m building and validating software i’ll likely want to collect metrics. some examples here are when hacking on yaxpeax-x86, i want to keep an eye on overall performance of instruction decoding, as well as total code size with and without instruction formatting support built in.

these kinds of metrics vary from target to target (from codegen differences and target feature differences) as well as OS to OS (ABI differences, let alone disk access and power management differences), or compiler version to compiler version. this is of course non-exhaustive.

wherever validation is done, runners should be able to post metrics back to more durable storage that can retain a log of information related to a change.

i don’t know everything i want to validate against up front

if a new version of an operating system is released, it’s reasonable to want to revalidate old commits against the new system - has something broken, and this helps narrow when a breakage occurred? or perhaps everything worked and continues working, and this reinforces the fact?

if a new compiler or microarchitecture comes into existence, does it yield different performance characteristics? how much so? were previous optimizations still beneficial? are they now not?

these are questions i want to ask surprisingly often. when i’d already have numbers to collect in these scenarios (see also: first-class metrics), this ought to be as simple as setting up a runner on a new system, identifying as such to the CI system, and receiving jobs until a new row in a matrix is filled out.

sure would be nice to check CI locally!!

GitHub Actions and… every other CI solution… tend to push you towards programming some YAML or Groovy, pushing a change, and twiddling your thumbs while a VM you don’t have access to runs and hopefully does not explode. it’s hard to debug, it’s hard to validate in a quick and iterative way, and it’s just not pleasant. whatever i’d call “good” here should not feel like sandpaper to adjust.

ideally this would be something like being able to take whatever runner and just run it against a local CI configuration. maybe a little setup to get to this.

i don’t want to have to work against GitHub

while many people expect to use or see GitHub, and good integration with GitHub’s statuses are nice, i’d actually prefer email notifications that builds are running or completed. this is doubly true when builds may start or complete for reasons fully unrelated to code changes, like simply making a new test configuration available. in an ideal world, cgit or whatever else could be made to include links to validation available for a commit of interest.

in practice

build-o-tron gets me enough of the way here that i don’t think about it, but it’s far from ideal. the yaxpeax-x86 goodfile defines how those commits are exercised, complete with code size and performance measurements. actual builds (70f7673) are from various points across different machines. an older lower-power Intel test machine, a more recent AMD Zen 2 workstation, and at some time later, a slate of AWS VMs to fill out Zen 3, Zen 4, and corresponding eras of Intel microarchitectures.

the index page itself, https://ci.butactuallyin.space/, is rudimentary but gets the job done. there’s no notion of virtualization or containerization; builds could depend on host configuration, tamper with build hosts, “escape onto the network” isn’t even a reasonable statement because there’s no containment to escape.

pushes to GitHub result in checks, and can push status updates, but is … clearly not operationally mature.

early on there was a questionable bug where i failed to poll a readable properly, so channels would hang. there is still a bug that results in connections between runners and the CI coordinator getting lost. the builds then eventually time out and retry. for as much as it is “good enough”, it is very much living with the consequences of my own decisions and limitations.