Benchmarking Platform

Measure the vendor, not the brochure.

Six vendors, four published benchmark protocols, bootstrap confidence intervals on every cell, dated spoof-hardness witness on every XEB number. The scorecard a procurement committee can sign on.

Request a scorecard run

Click a protocol column header to sort. Illustrative envelope, not a published scorecard.

Vendor · Backend

IBM Heron r2 superconducting

0.9968 ±0.08

0.9954 ±0.12

0.9831 ±0.42

LB-OK consistency

IonQ Tempo trapped-ion

0.9982 ±0.06

0.9971 ±0.09

0.9885 ±0.36

LB-OK consistency

Quantinuum H3 trapped-ion

0.9991 ±0.05

0.9985 ±0.08

0.9905 ±0.34

LB-OK consistency

Rigetti Ankaa-3 superconducting

0.9946 ±0.11

0.9928 ±0.16

0.9742 ±0.58

LB-OK consistency

Pasqal Orion β neutral-atom

0.9892 ±0.14

0.9871 ±0.19

0.9618 ±0.72

LB-WIDE consistency

QuEra Aquila neutral-atom

0.9841 ±0.17

0.9812 ±0.24

0.9498 ±0.89

LB-WIDE consistency

Cell saturation reflects fidelity. Whisker beneath each number is the 95% bootstrap CI half-width. Framework SHA · dfa7c0

Heat map at a glance

One frame. Six vendors. Four protocols.

Hover a cell for methodology, sample size, and dated witness. Darker steel-blue tracks higher fidelity. Amber on Forrelation flags upstream-fit caveats. The same shape across vendors so a procurement reviewer reads the page in seconds.

Vendor / Protocol

XEB

FORR

IBM Heron r2

0.997

0.995

0.983

LB-OK

IonQ Tempo

0.998

0.997

0.989

LB-OK

Quantinuum H3

0.999

0.991

LB-OK

Rigetti Ankaa-3

0.995

0.993

0.974

LB-OK

Pasqal Orion β

0.989

0.987

0.962

LB-WIDE

QuEra Aquila

0.984

0.981

0.950

LB-WIDE

Colour ramp · fidelity

0.93 0.97 1.00

LB-OK · lower bound consistent

LB-WIDE · upstream caveats

The four benchmark protocols

Four published primitives. Each cell on the leaderboard ties to one of them.

The protocols are not a DeployQuantum invention. Each is anchored to a published paper, a published estimator, and a published statistical procedure. The platform binds them to one signed scorecard shape per run.

RB Protocol 01

Randomised Benchmarking

Average Clifford infidelity across length sweeps with a randomised twirl.

What the fit returns

Single-exponential survival fit returns the depolarising channel parameter.

CB Protocol 02

Cycle Benchmarking

Separates coherent from stochastic error per cycle, parameter-efficient relative to full process tomography.

What the fit returns

Pauli-twirled cycle decay returns a sparse Pauli-Lindblad coefficient map per coupling.

XEB Protocol 03

Linear Cross-Entropy

Random-circuit sampling primitive with a dated spoof-hardness witness on every cell.

What the fit returns

Linear cross-entropy estimator returns a fidelity proxy under a published anti-concentration assumption.

FORR Protocol 04

Forrelation (k-fold)

Provable separation rather than a measurement primitive. The lower bound is the proof object.

What the fit returns

Classical lower-bound certificate consistent with the published k-fold Forrelation oracle separation.

Anchors are cited in the manifest by paper identifier. This surface carries no person-name attribution.

Spoof-hardness witness

The frontier moves. The scorecard records the date.

Every XEB cell carries a dated reference. When classical attacks advance, affected scorecards demote with a dated note. The reference date is a field on every scorecard, not a footnote.

2022 Tensor-network attack of Pan-Zhang demotes Sycamore-2019 parameters. · 2024 Extended tensor-network methods further compress XEB-class hardness at fixed (n, depth). · 2019 Forrelation oracle lower bound is information-theoretic and unconditional within the query model. · 2025 Witness frontier reference date is a field per scorecard, not a footnote. · 2026 Frontier registry advances. Affected scorecards demote with a dated reference. · today F_XEB above zero on a noiseless simulator is uninformative about hardware advantage. · 2022 Tensor-network attack of Pan-Zhang demotes Sycamore-2019 parameters. · 2024 Extended tensor-network methods further compress XEB-class hardness at fixed (n, depth). · 2019 Forrelation oracle lower bound is information-theoretic and unconditional within the query model. · 2025 Witness frontier reference date is a field per scorecard, not a footnote. · 2026 Frontier registry advances. Affected scorecards demote with a dated reference. · today F_XEB above zero on a noiseless simulator is uninformative about hardware advantage. ·

The signed scorecard JSON

Every cell on the leaderboard is replayable from this manifest.

Same seed. Same SDK version. Same hash. A third party reruns the script and reproduces the cell byte for byte.

{
  "schema_version": "0.1.0",
  "vendor":         "IBM",
  "backend_name":   "ibm_torino",
  "vendor_sdk_version": "qiskit-1.x (pinned)",
  "benchmark":      "RB_2Q",
  "n_qubits":       2,
  "shots_per_circuit": 1024,
  "transpile": {
    "seed":          7,
    "basis_gates":   ["sx", "rz", "cz"],
    "qubit_indices": [12, 13]
  },
  "measured": {
    "primary_metric_name": "r_per_clifford",
    "value":         0.9968,
    "ci_95_lower":   0.9960,
    "ci_95_upper":   0.9976
  },
  "spoof_hardness_witness": {
    "named_assumption":        "BFNV anti-concentration plus #P-hardness",
    "defeated_attacks":        ["..."],
    "not_defeated_attacks":    ["..."],
    "frontier_reference_date": "2026-05-10"
  },
  "fit_residuals_path":  "build/output/rb_2q_fit_residuals.json",
  "rb_certificate_hash": "<sha-256>",
  "scorecard_hash":      "<sha-256>"
}

Field index

schema_version "0.1.0"
vendor "IBM" | "IonQ" | "Quantinuum" | "Rigetti" | "Pasqal" | "QuEra"
backend_name string
vendor_sdk_version string (pinned at run time)
benchmark "RB_1Q" | "RB_2Q" | "CB" | "XEB" | "Forrelation"
n_qubits integer
shots_per_circuit integer (default 1024)
transpile.seed integer (transpiler seed, deterministic)
transpile.basis_gates array of strings
transpile.qubit_indices array of integers (physical qubits used)

Eleven more fields in the manifest. Each one replayable.

framework_sha dfa7c0

dfa7c0777eca70ffb835a1ec6f3f351cee7b2ea848ba3a7f228082d08db80bc7

Pinned in every manifest. The classical pre and post script hashes pin alongside, and the scorecard hash closes the chain.

What this does not claim

A measurement protocol, not a quantum-advantage proof.

F_XEB above zero on a noiseless simulator is uninformative about advantage. On a real QPU it is hard to spoof classically only under the BFNV anti-concentration plus #P-hardness conjecture pair, which can fail. The witness demotes when the frontier advances. The Forrelation lower bound is information-theoretic at the oracle level. The platform separates the conjecture-conditional claim from the unconditional one in every scorecard so the buyer reads each on its own footing.

Per-vendor scorecards are deferred to Tool 3 and Tool 5 operational, with a real-QPU run in the hardware-run-evidence block. At this ship gate, the release state is internal-only and the leaderboard you see is the shape the scorecard takes when it ships, not a published vendor ranking.

Questions

The five questions a procurement reviewer asks.

01
Why is the leaderboard at hero scale and not a marketing chart further down?

The scorecard is the product. A vendor brochure is a chart. The buyer reads the same shape an auditor reads: cells with a fidelity number, a CI whisker, a witness date. Marketing positioning lives in vendor brochures. Procurement lives in the scorecard.
02
Are the cells published vendor measurements?

No. At this ship gate, the release state is internal-only and the real-QPU run is deferred to Tool 3 plus Tool 5 operational. The leaderboard renders the shape the scorecard takes when shipped. Numbers derive from each vendor's published median two-qubit fidelity in the hardware-roadmap registry, perturbed by a protocol-specific delta. Treat the page as illustrative envelope, not as a vendor ranking.
03
What exactly does the bootstrap whisker mean under each cell?

A 95% bootstrap confidence interval around the primary metric, computed by resampling the underlying circuit-shot dataset. The whisker is a visual rendering of the CI half-width. Where the CI is wide relative to the difference between two vendors, the chart says they are not separable at the run's shot budget. The CI is reported as a number in every scorecard JSON, not only as a whisker.
04
Why does Forrelation report LB-OK rather than a fidelity?

The Forrelation protocol is a provable oracle separation, not a fidelity primitive. The cell is a consistency check against the classical lower bound. LB-OK means the run is consistent with the published lower-bound exponent. LB-WIDE means consistent but with upstream fit caveats in the manifest.
05
What does the witness date mean and when does a scorecard demote?

Every XEB cell carries a dated frontier reference. When the classical-attack frontier advances, the affected scorecards demote in the next registry update, with a dated note recording which attack class moved and which cells were re-rendered. The witness is not a frozen number. It is a measurement under a stated and dated assumption envelope.

Start with one run

Six vendors. Four protocols. One signed scorecard.

Request a run. We respond within one business day. The customer-asset attestation is a precondition, and the release-gate state is recorded against your engagement, not against a generic download.

Request a scorecard run Book a scoping call

[email protected]

Sibling solutions Full catalog →

Feasibility Audit Cryptanalysis Pair Vendor-Agnostic Transpiler Chemistry Compiler Mitigation Spine

Measure the vendor, not the brochure.

One frame. Six vendors. Four protocols.

Four published primitives. Each cell on the leaderboard ties to one of them.

Randomised Benchmarking

Cycle Benchmarking

Linear Cross-Entropy

Forrelation (k-fold)

The frontier moves. The scorecard records the date.

Every cell on the leaderboard is replayable from this manifest.

A measurement protocol, not a quantum-advantage proof.

The five questions a procurement reviewer asks.

Why is the leaderboard at hero scale and not a marketing chart further down?

Are the cells published vendor measurements?

What exactly does the bootstrap whisker mean under each cell?

Why does Forrelation report LB-OK rather than a fidelity?

What does the witness date mean and when does a scorecard demote?

Six vendors. Four protocols. One signed scorecard.