AutoZyme

About AutoZyme

AutoZyme is a framework for producing drop-in performance upgrades to widely used scientific toolkits. It combines two ingredients:

  • An autonomous-research loop that proposes, implements, and benchmarks candidate optimizations.
  • A concordance gate that rejects any change whose outputs diverge from the upstream baseline beyond a method-appropriate tolerance.

Current releases

Datasets

Every benchmark is run on at least one of the following publicly available single-cell datasets, spanning four orders of magnitude in cell count:

Dataset Cells Source Used for
ifnb 14k Kang et al., Nature Biotechnology 2018 · GSE96583 Integration (CCA / RPCA), SCTransform
pbmc68k 68k Zheng et al., Nature Communications 2017 · 10x Genomics Small-scale benchmark (all core methods)
pbmc200k_glaucoma 208k CZ CELLxGENE · Human PBMC Glaucoma Atlas Medium-scale benchmark, batch HVG
heart_adult 486k Litviňuková et al., Nature 2020 Large-scale benchmark (>36 GB RAM)

Authors

Elliot Xie
Lead developer · maintainer
University of Wisconsin–Madison

Contributor list is updated with each release — see individual repository CONTRIBUTING.md files and commit history. If you'd like to be listed, open a PR.

Code

AutoZyme is fully open source. The three canonical repositories are:

Contributions are welcome. If you want to propose a toolkit to accelerate, use the Suggest & Vote page. If you want to submit an optimization for a method that's already in-scope, open a pull request on the relevant package repo — see its CONTRIBUTING.md for the benchmark contract (pinned environment, dataset, concordance metric, and tolerance) a submission must satisfy.

How we choose what to work on next

On the Suggest & Vote page, anyone can nominate a toolkit or method. Each nomination can be upvoted, downvoted, and commented on. We periodically review the ranking and pick top-voted entries to optimize next. No fixed cadence — progress depends on the complexity of the method.

How benchmarks are reported

Every benchmark row is one method × dataset × thread count combination, run on a fixed hardware profile. The optimized run must pass a concordance check to be reported at all. The Benchmarks page shows the raw numbers for every run; the homepage aggregates them into per-method best speedups.

Status

AutoZyme is under active development. Details of the search loop and the results shown here are the subject of an in-progress manuscript.

How to cite

@misc{autozyme2026,
  title  = {AutoZyme: Autonomous-Research-Driven Speedups for Scientific Toolkits},
  author = {The AutoZyme Team},
  year   = {2026},
  note   = {Manuscript in preparation}
}