Reference¶
Every flag, marker, fixture, CLI command, and public function. The benchmem CLI
options are rendered live from --help below; everything else — pytest flags, the
marker, the fixture, the blob schema, the Python API — is curated here. For the
narrative versions see Getting started,
Metrics, Dims, and Compare & plot.
import os
import sys
from pathlib import Path
os.environ["FORCE_COLOR"] = "1"
os.environ["PATH"] = f"{Path(sys.executable).parent}{os.pathsep}{os.environ['PATH']}"
pytest command-line flags¶
The plugin adds these to any pytest run (alongside pytest-benchmark's own flags):
| Flag | Default | What |
|---|---|---|
--benchmark-memory |
off | record peak memory for every benchmark() call, no test changes. (The benchmark_memory fixture is always measured, with or without this flag.) |
--benchmark-memory-repeats=N |
1 |
default memray passes per benchmark, suite-wide (reported peak is the min). Per-test @pytest.mark.benchmem(repeats=N) overrides it. |
--benchmark-memory-columns=… |
peak |
which memory metrics the table shows, comma-separated and in order: peak, allocated, allocs. Default is peak only; the table captions the rest as available. |
--benchmark-memory-stats=… |
min,mean,max |
when a benchmark is measured more than once (repeats > 1), the stats each shown metric spreads into: min, mean, max, median, stddev. A single pass stays one column. |
--benchmark-memory-compare[=REF] |
off | compare this run's peak memory against a prior saved run (latest, or a pytest-benchmark storage ref like 0001); folds base + Δ peak columns into the combined table. |
--benchmark-memory-compare-fail=FIELD:THRESHOLD |
— | fail the session on a memory regression (repeatable). Implies --benchmark-memory-compare. Fields: peak, allocated, allocations. |
Timing regressions still use pytest-benchmark's own --benchmark-compare /
--benchmark-compare-fail; the --benchmark-memory-compare* flags are the memory
mirror. Their baseline comes from pytest-benchmark's storage (.benchmarks/) — save
one first with --benchmark-save=NAME or --benchmark-autosave, or the gate finds
nothing and passes. See Gate CI on regressions.
The benchmem marker¶
@pytest.mark.benchmem(repeats=3)
def test_build(benchmark_memory):
...
| Kwarg | Default | What |
|---|---|---|
repeats |
1 |
measure this test with N memray passes. Every pass is kept (the blob stores the whole series); the headline peak is the minimum across them, and --stat reports any other. Overrides the suite-wide --benchmark-memory-repeats for this test. |
max_peak |
— | fail the test if the headline peak exceeds this absolute ceiling. A size string ("100MiB", units B/KiB/MiB/GiB) or a bare int (bytes). |
max_allocated |
— | as max_peak, on allocated (total bytes). |
max_allocations |
— | as above, on the allocations count — a bare number (no unit). |
Absolute ceilings — max_peak / max_allocated / max_allocations¶
@pytest.mark.benchmem(max_peak="100MiB", max_allocations=5000)
def test_build(benchmark_memory):
benchmark_memory(build_model, 1000)
A baseline-free guardrail: the test fails if the measured headline metric exceeds the
ceiling (test_build: peak 117 MiB exceeds max_peak 100 MiB). Thresholds are absolute
only — there's no saved run to take a percent of; for relative gating against a prior run
use --benchmark-memory-compare-fail or benchmem compare --fail-on. With repeats > 1 the
gate reads the headline min (the same value the table and JSON report), so it's the
cleanest floor. The ceiling is enforced wherever memory is measured — the benchmark_memory
fixture and the --benchmark-memory patch — but a plain benchmark() call without
--benchmark-memory measures no memory, so the marker is a no-op there.
Scope: this gates the benchmarked action only (the isolated call pytest-benchmem measures), not the whole test. For a whole-test limit or leak check, that's pytest-memray's
limit_memory/limit_leaks— see the README's "With pytest-memray".
Why once, when timing reruns many times? Peak memory is allocator demand — the bytes
your code requests for a given code path and inputs — not a wall-clock number, so it's
near-deterministic and one pass is usually representative. Raise repeats when the peak
isn't deterministic (hash randomization, GC timing, randomized inputs) to settle the
min floor and quantify the spread (the min/mean/max columns and --stat stddev).
The benchmark_memory fixture¶
Depends on pytest-benchmark's benchmark fixture; times via pytest-benchmark, then
measures peak in a separate untimed pass.
Order — timing first, then memory. Every call form runs pytest-benchmark's timing
(calibration + all rounds) first, then the memray pass — so memory is measured on an
already-warmed function and the allocator hooks never touch the timing. This holds for
__call__, pedantic, and the --benchmark-memory patch alike. (The standalone
measure_peak / measure_memory have no timing phase, so they measure cold — warm up
first, or use repeats > 1, if a cold first call would distort the peak.)
Call form — times then measures function(*args, **kwargs):
benchmark_memory(sorted, data)
Pedantic form — explicit control, like pytest-benchmark's pedantic plus a
memory pass:
benchmark_memory.pedantic(target, args=(), kwargs=None, setup=None,
rounds=1, warmup_rounds=0, iterations=1)
setup— a callable run untracked before each measured call; if it returns(args, kwargs), those supply the call's arguments. Use it to rebuild fresh state each round, essential for side-effectful workloads.rounds,warmup_rounds,iterations— as in pytest-benchmark.
Mostly memory, little timing? There's no memory-only switch — the entry rides
pytest-benchmark's timing. To trim it: --benchmark-min-rounds=1 --benchmark-max-time=0
(no test changes), or pedantic(rounds=1, warmup_rounds=0) for a single call. For pure
memory outside pytest, use measure_peak / measure_memory.
Attributes (available after a call):
| Attribute | What |
|---|---|
extra_info |
pytest-benchmark's per-benchmark dict. Set scalars here to attach analysis dims; the memory blob lands here under the benchmem key. |
peak_bytes |
peak memory (bytes) from the last call, or None before any call. |
result |
the full MemoryResult from the last call, or None. |
The extra_info.benchmem blob¶
Each measured benchmark stores this dict under extra_info["benchmem"] — three flat
per-repeat series, one entry per memray pass. Every reported number (headline peak =
min, any --stat) derives from these on read:
| Key | What |
|---|---|
peak_bytes |
per-repeat high-water of live bytes — the peak metric (headline = min) |
allocations |
per-repeat allocation count — the allocations metric |
total_bytes |
per-repeat total bytes allocated — the allocated metric (churn peak hides) |
{"peak_bytes": [800000, 805000], "allocations": [12, 12], "total_bytes": [800000, 805000]}
See Metrics for when to reach for each, and --stat for distributions.
CLI — benchmem¶
Installed with pytest-benchmem[plot]. The two subcommands and their options,
straight from --help:
!benchmem --help
Usage: benchmem [OPTIONS] COMMAND [ARGS]... pytest-benchmem — plot and compare benchmark runs. ╭─ Options ────────────────────────────────────────────────────────────────────╮ │ --install-completion Install completion for the current shell. │ │ --show-completion Show completion for the current shell, to copy │ │ it or customize the installation. │ │ --help Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────────╯ ╭─ Commands ───────────────────────────────────────────────────────────────────╮ │ plot Render an interactive plotly view from one or more pytest-benchmark │ │ runs. │ │ compare Print a per-id comparison table across two or more runs (and │ │ optionally gate CI). │ │ sweep Run a benchmark suite across several installed versions of a │ │ package. │ ╰──────────────────────────────────────────────────────────────────────────────╯
benchmem compare¶
A per-id delta table (b − a) with percent change; ids in only one run show —.
!benchmem compare --help
Usage: benchmem compare [OPTIONS] RUNS... Print a per-id comparison table across two or more runs (and optionally gate CI). ╭─ Arguments ──────────────────────────────────────────────────────────────────╮ │ * runs RUNS... Two or more pytest-benchmark runs, oldest → newest │ │ (a sweep is N). │ │ [required] │ ╰──────────────────────────────────────────────────────────────────────────────╯ ╭─ Options ────────────────────────────────────────────────────────────────────╮ │ --metric [time|peak|allocated|allocat Metric: time | peak | │ │ ions|memory] allocated | allocations | │ │ memory (memory is an alias of │ │ peak; pair with --stat for a │ │ distribution). │ │ [default: time] │ │ --stat TEXT Distribution stat over each │ │ benchmark's per-repeat series │ │ (min | max | mean | median | │ │ stddev) for │ │ peak/allocated/allocations. │ │ Default: the headline value. │ │ --sort TEXT Row order: name (id) | value │ │ (largest in the last run) | │ │ change. │ │ [default: name] │ │ --csv PATH Also write the raw (unscaled) │ │ comparison to this CSV file. │ │ --fail-on TEXT Exit non-zero on a regression │ │ of the first run vs the last. │ │ FIELD:THRESHOLD, repeatable — │ │ e.g. --fail-on peak:10% │ │ --fail-on peak:5MiB. │ │ --help Show this message and exit. │ ╰──────────────────────────────────────────────────────────────────────────────╯
--metric is one of time, peak, allocated, allocations, or memory (an
alias for peak); pair it with --stat (min/max/mean/median/stddev) for a
distribution over the per-repeat series. --fail-on FIELD:THRESHOLD (repeatable) exits
non-zero past a threshold; FIELD is peak, allocated, allocations, or time,
and THRESHOLD is either a percent (peak:10%) or an absolute:
- bytes fields (
peak,allocated):5MiB(unitsB/KiB/MiB/GiB) allocations: a bare count,5time:1ms(unitss/ms/us/µs/ns)
benchmem plot¶
Writes an interactive plotly view to standalone HTML. The view auto-selects by run
count (1 → scaling, 2 → scatter, 3+ → sweep); override with --view.
!benchmem plot --help
Usage: benchmem plot [OPTIONS] RUNS... Render an interactive plotly view from one or more pytest-benchmark runs. ╭─ Arguments ──────────────────────────────────────────────────────────────────╮ │ * runs RUNS... pytest-benchmark JSON file(s). [required] │ ╰──────────────────────────────────────────────────────────────────────────────╯ ╭─ Options ────────────────────────────────────────────────────────────────────╮ │ --metric [time|peak|allocated|al Metric: time | peak | │ │ locations|memory] allocated | allocations │ │ | memory (memory is an │ │ alias of peak; pair with │ │ --stat for a │ │ distribution). │ │ [default: time] │ │ --view TEXT compare | scatter | │ │ sweep | scaling │ │ (default: by count). │ │ --facet TEXT Dim to facet by. │ │ --x TEXT scaling: dim for the │ │ x-axis. │ │ --clip FLOAT Clamp the colour scale. │ │ --label -l TEXT Series label per run, in │ │ order (repeat). Default: │ │ stem. │ │ --output -o PATH HTML out. │ │ --open --no-open [default: no-open] │ │ --help Show this message and │ │ exit. │ ╰──────────────────────────────────────────────────────────────────────────────╯
--facet and --label/-l (a series label per run, repeatable, defaulting to the
file stem) accept the same dims your tests carry.
Public Python API¶
Light to import — pytest_benchmem re-exports only the engine and the readers;
pytest_benchmem.plotting pulls plotly and pytest_benchmem.sweep shells to uv,
so import those submodules directly.
Engine — pytest_benchmem¶
measure_peak(action, repeats=1) -> int
measure_memory(action, repeats=1) -> MemoryResult
action is a zero-arg callable. measure_peak returns the bare peak in bytes;
measure_memory returns the full MemoryResult (peak_bytes, peak_bytes_max,
allocations, total_bytes, repeats).
Readers & loader — pytest_benchmem¶
from_pytest_benchmark(path, *, metric="min") -> (label, [Sample], unit)
memory_from_pytest_benchmark(path, *, field="peak_bytes") -> (label, [Sample], unit)
load_samples(path, *, metric="time", stat="min") -> (label, [Sample], unit)
load_long_df(runs, *, metric="time", stat="min") -> (DataFrame, unit)
discover_runs(root=".benchmarks") -> [Path]
human_bytes(n) -> str
from_pytest_benchmarkreads timing (seconds, fromstats);memory_from_pytest_benchmarkreads memory (bytes, fromextra_info.benchmem).load_samplesis the unified reader —metricis one oftime/peak/allocated/allocations;stat(time only) ismin/median/…load_long_dfstacks runs into the tidy frame the plots pivot — columnssnapshot,id,value, plus one per dim.discover_runs()collects saved runs from.benchmarks/— pytest-benchmark's storage dir, where--benchmark-save/--benchmark-autosavewrite — so you can hand the readers a directory instead of listing files.- A
Sampleis(id, value, dims);dimsis a mapping of dim name →str/int/float.
Plotting — pytest_benchmem.plotting¶
Every plot_* returns (figure, n_ids):
plot_scaling(snapshots, *, metric="time", x=None, color=None, facet=None, log="auto", labels=None)
plot_scatter(snapshots, *, metric="time", facet=None, clip=None, labels=None)
plot_compare(snapshots, *, metric="time", sort="absolute", facet=None, clip=None, labels=None)
plot_sweep(snapshots, *, metric="time", clip=None, labels=None)
snapshots is a list of run JSON paths. labels names the series per run (defaults
to the file stems) — the API behind plot's -l/--label. plot_compare's sort is
"absolute" (native units) or "relative" (percent).
Sweeps — pytest_benchmem.sweep¶
sweep(versions, run, **provision_kwargs) -> [failed_version_label]
See Cross-version sweeps for the parameters and the Venv object.