Benchmarking Platform

Measure the vendor, not the brochure.

Six vendors, four published benchmark protocols, bootstrap confidence intervals on every cell, dated spoof-hardness witness on every XEB number. The scorecard a procurement committee can sign on.

Click a protocol column header to sort. Illustrative envelope, not a published scorecard.
Vendor · Backend
IBM Heron r2 superconducting
0.9968 ±0.08
0.9954 ±0.12
0.9831 ±0.42
LB-OK consistency
IonQ Tempo trapped-ion
0.9982 ±0.06
0.9971 ±0.09
0.9885 ±0.36
LB-OK consistency
Quantinuum H3 trapped-ion
0.9991 ±0.05
0.9985 ±0.08
0.9905 ±0.34
LB-OK consistency
Rigetti Ankaa-3 superconducting
0.9946 ±0.11
0.9928 ±0.16
0.9742 ±0.58
LB-OK consistency
Pasqal Orion β neutral-atom
0.9892 ±0.14
0.9871 ±0.19
0.9618 ±0.72
LB-WIDE consistency
QuEra Aquila neutral-atom
0.9841 ±0.17
0.9812 ±0.24
0.9498 ±0.89
LB-WIDE consistency
Cell saturation reflects fidelity. Whisker beneath each number is the 95% bootstrap CI half-width. Framework SHA · dfa7c0

One frame. Six vendors. Four protocols.

Hover a cell for methodology, sample size, and dated witness. Darker steel-blue tracks higher fidelity. Amber on Forrelation flags upstream-fit caveats. The same shape across vendors so a procurement reviewer reads the page in seconds.

Vendor / Protocol
RB
CB
XEB
FORR
IBM Heron r2
0.997 Randomised Benchmarking · Heron r2 RB-2Q on the most-used coupler. Heavy-hex topology. n=1024 shots · CI±0.08
0.995 Cycle Benchmarking · Heron r2 Sparse Pauli-Lindblad fit per coupling. n=1024 shots · CI±0.12
0.983 Linear Cross-Entropy · Heron r2 F_XEB at depth 12, n=16. Tensor-network frontier as of witness date. n=1024 shots · CI±0.42
LB-OK Forrelation (k-fold) · Heron r2 k=4 lower bound consistent at N=16 by exhaustive check. k=4 oracle · N=16
IonQ Tempo
0.998 Randomised Benchmarking · Tempo RB-2Q across all-to-all chain. Long coherence relative to gate time. n=1024 shots · CI±0.06
0.997 Cycle Benchmarking · Tempo Per-Pauli decay rates separated. n=1024 shots · CI±0.09
0.989 Linear Cross-Entropy · Tempo F_XEB at depth 12, n=16. Witness names defeated attack classes. n=1024 shots · CI±0.36
LB-OK Forrelation (k-fold) · Tempo k=4 lower bound consistent at N=16. k=4 oracle · N=16
Quantinuum H3
0.999 Randomised Benchmarking · H3 RB-2Q target on zone-graph racetrack. Spec-sheet number, not third-party measured. n=1024 shots · CI±0.05
0.999 Cycle Benchmarking · H3 Target per-Pauli rate. Pre-spine, deferred. n=1024 shots · CI±0.08
0.991 Linear Cross-Entropy · H3 F_XEB at depth 12, n=16. Witness anchored to the BFNV plus #P-hardness pair. n=1024 shots · CI±0.34
LB-OK Forrelation (k-fold) · H3 k=4 lower bound consistent at N=16. k=4 oracle · N=16
Rigetti Ankaa-3
0.995 Randomised Benchmarking · Ankaa-3 RB-2Q on tunable-coupler ladder. n=1024 shots · CI±0.11
0.993 Cycle Benchmarking · Ankaa-3 Cycle decay across two-qubit gates. n=1024 shots · CI±0.16
0.974 Linear Cross-Entropy · Ankaa-3 F_XEB at depth 12, n=16. Wider band, register sensitive. n=1024 shots · CI±0.58
LB-OK Forrelation (k-fold) · Ankaa-3 k=4 lower bound consistent at N=16. k=4 oracle · N=16
Pasqal Orion β
0.989 Randomised Benchmarking · Orion β RB-2Q on Rydberg-blockade CZ. Digital mode. n=1024 shots · CI±0.14
0.987 Cycle Benchmarking · Orion β Cycle decay on the digital surface. n=1024 shots · CI±0.19
0.962 Linear Cross-Entropy · Orion β F_XEB at depth 8, n=16. Mid-circuit measurement not supported. n=1024 shots · CI±0.72
LB-WIDE Forrelation (k-fold) · Orion β k=4 lower bound consistent but check CI of upstream fit. k=4 oracle · N=16
QuEra Aquila
0.984 Randomised Benchmarking · Aquila Analog Hamiltonian. RB on the digital surface where present. n=1024 shots · CI±0.17
0.981 Cycle Benchmarking · Aquila Cycle-style decay, Rydberg-blockade pulses. n=1024 shots · CI±0.24
0.950 Linear Cross-Entropy · Aquila F_XEB at depth 8, n=14. Wide band, analog modality. n=1024 shots · CI±0.89
LB-WIDE Forrelation (k-fold) · Aquila k=4 lower bound consistent with upstream caveats. k=4 oracle · N=16
Colour ramp · fidelity
0.93 0.97 1.00
LB-OK · lower bound consistent
LB-WIDE · upstream caveats

Four published primitives. Each cell on the leaderboard ties to one of them.

The protocols are not a DeployQuantum invention. Each is anchored to a published paper, a published estimator, and a published statistical procedure. The platform binds them to one signed scorecard shape per run.

RB Protocol 01

Randomised Benchmarking

Average Clifford infidelity across length sweeps with a randomised twirl.

What the fit returns

Single-exponential survival fit returns the depolarising channel parameter.

CB Protocol 02

Cycle Benchmarking

Separates coherent from stochastic error per cycle, parameter-efficient relative to full process tomography.

What the fit returns

Pauli-twirled cycle decay returns a sparse Pauli-Lindblad coefficient map per coupling.

XEB Protocol 03

Linear Cross-Entropy

Random-circuit sampling primitive with a dated spoof-hardness witness on every cell.

What the fit returns

Linear cross-entropy estimator returns a fidelity proxy under a published anti-concentration assumption.

FORR Protocol 04

Forrelation (k-fold)

Provable separation rather than a measurement primitive. The lower bound is the proof object.

What the fit returns

Classical lower-bound certificate consistent with the published k-fold Forrelation oracle separation.

Anchors are cited in the manifest by paper identifier. This surface carries no person-name attribution.

The frontier moves. The scorecard records the date.

Every XEB cell carries a dated reference. When classical attacks advance, affected scorecards demote with a dated note. The reference date is a field on every scorecard, not a footnote.

2022 Tensor-network attack of Pan-Zhang demotes Sycamore-2019 parameters. · 2024 Extended tensor-network methods further compress XEB-class hardness at fixed (n, depth). · 2019 Forrelation oracle lower bound is information-theoretic and unconditional within the query model. · 2025 Witness frontier reference date is a field per scorecard, not a footnote. · 2026 Frontier registry advances. Affected scorecards demote with a dated reference. · today F_XEB above zero on a noiseless simulator is uninformative about hardware advantage. · 2022 Tensor-network attack of Pan-Zhang demotes Sycamore-2019 parameters. · 2024 Extended tensor-network methods further compress XEB-class hardness at fixed (n, depth). · 2019 Forrelation oracle lower bound is information-theoretic and unconditional within the query model. · 2025 Witness frontier reference date is a field per scorecard, not a footnote. · 2026 Frontier registry advances. Affected scorecards demote with a dated reference. · today F_XEB above zero on a noiseless simulator is uninformative about hardware advantage. ·

Every cell on the leaderboard is replayable from this manifest.

Same seed. Same SDK version. Same hash. A third party reruns the script and reproduces the cell byte for byte.

{
  "schema_version": "0.1.0",
  "vendor":         "IBM",
  "backend_name":   "ibm_torino",
  "vendor_sdk_version": "qiskit-1.x (pinned)",
  "benchmark":      "RB_2Q",
  "n_qubits":       2,
  "shots_per_circuit": 1024,
  "transpile": {
    "seed":          7,
    "basis_gates":   ["sx", "rz", "cz"],
    "qubit_indices": [12, 13]
  },
  "measured": {
    "primary_metric_name": "r_per_clifford",
    "value":         0.9968,
    "ci_95_lower":   0.9960,
    "ci_95_upper":   0.9976
  },
  "spoof_hardness_witness": {
    "named_assumption":        "BFNV anti-concentration plus #P-hardness",
    "defeated_attacks":        ["..."],
    "not_defeated_attacks":    ["..."],
    "frontier_reference_date": "2026-05-10"
  },
  "fit_residuals_path":  "build/output/rb_2q_fit_residuals.json",
  "rb_certificate_hash": "<sha-256>",
  "scorecard_hash":      "<sha-256>"
}
Field index
  • schema_version "0.1.0"
  • vendor "IBM" | "IonQ" | "Quantinuum" | "Rigetti" | "Pasqal" | "QuEra"
  • backend_name string
  • vendor_sdk_version string (pinned at run time)
  • benchmark "RB_1Q" | "RB_2Q" | "CB" | "XEB" | "Forrelation"
  • n_qubits integer
  • shots_per_circuit integer (default 1024)
  • transpile.seed integer (transpiler seed, deterministic)
  • transpile.basis_gates array of strings
  • transpile.qubit_indices array of integers (physical qubits used)

Eleven more fields in the manifest. Each one replayable.

framework_sha dfa7c0
dfa7c0777eca70ffb835a1ec6f3f351cee7b2ea848ba3a7f228082d08db80bc7

Pinned in every manifest. The classical pre and post script hashes pin alongside, and the scorecard hash closes the chain.

A measurement protocol, not a quantum-advantage proof.

F_XEB above zero on a noiseless simulator is uninformative about advantage. On a real QPU it is hard to spoof classically only under the BFNV anti-concentration plus #P-hardness conjecture pair, which can fail. The witness demotes when the frontier advances. The Forrelation lower bound is information-theoretic at the oracle level. The platform separates the conjecture-conditional claim from the unconditional one in every scorecard so the buyer reads each on its own footing.

Per-vendor scorecards are deferred to Tool 3 and Tool 5 operational, with a real-QPU run in the hardware-run-evidence block. At this ship gate, the release state is internal-only and the leaderboard you see is the shape the scorecard takes when it ships, not a published vendor ranking.

The five questions a procurement reviewer asks.

  1. 01

    Why is the leaderboard at hero scale and not a marketing chart further down?

    The scorecard is the product. A vendor brochure is a chart. The buyer reads the same shape an auditor reads: cells with a fidelity number, a CI whisker, a witness date. Marketing positioning lives in vendor brochures. Procurement lives in the scorecard.

  2. 02

    Are the cells published vendor measurements?

    No. At this ship gate, the release state is internal-only and the real-QPU run is deferred to Tool 3 plus Tool 5 operational. The leaderboard renders the shape the scorecard takes when shipped. Numbers derive from each vendor's published median two-qubit fidelity in the hardware-roadmap registry, perturbed by a protocol-specific delta. Treat the page as illustrative envelope, not as a vendor ranking.

  3. 03

    What exactly does the bootstrap whisker mean under each cell?

    A 95% bootstrap confidence interval around the primary metric, computed by resampling the underlying circuit-shot dataset. The whisker is a visual rendering of the CI half-width. Where the CI is wide relative to the difference between two vendors, the chart says they are not separable at the run's shot budget. The CI is reported as a number in every scorecard JSON, not only as a whisker.

  4. 04

    Why does Forrelation report LB-OK rather than a fidelity?

    The Forrelation protocol is a provable oracle separation, not a fidelity primitive. The cell is a consistency check against the classical lower bound. LB-OK means the run is consistent with the published lower-bound exponent. LB-WIDE means consistent but with upstream fit caveats in the manifest.

  5. 05

    What does the witness date mean and when does a scorecard demote?

    Every XEB cell carries a dated frontier reference. When the classical-attack frontier advances, the affected scorecards demote in the next registry update, with a dated note recording which attack class moved and which cells were re-rendered. The witness is not a frozen number. It is a measurement under a stated and dated assumption envelope.

Start with one run

Six vendors. Four protocols. One signed scorecard.

Request a run. We respond within one business day. The customer-asset attestation is a precondition, and the release-gate state is recorded against your engagement, not against a generic download.