Skip to content

Reading these scores

This page is for anyone consuming the suite programmatically - an agent, a dashboard, a script - and for anyone who wants to read a number here and know exactly what it means. A single percentage is easy to misread as a verdict, so here's how the figures are built and where to get them as data.

Get the data, don't scrape the page

Every figure on the site is published as JSON, regenerated at build time from the same results the pages render from. Read that instead of parsing HTML:

  • /data/latest.json - the latest run in full: every target's tier scores, coverage, and per-capability and per-operation-area state.
  • /data/runs.json - the whole history, newest first: per-target tier scores, coverage and run-over-run movement for every recorded run.
  • /data/index.json - a discovery manifest: the tier and capability vocabularies, where each endpoint lives, and the licence.
  • /feed.xml - an Atom feed, one entry per run.

Every target carries the identical schema, live AWS DynamoDB included. The data is published under CC BY 4.0: use it freely, just credit paritysuite.org. The schema is versioned with a schemaVersion field, and a breaking change bumps it.

What a score actually is

The headline percentage is correctness over the operations a target implements - passed divided by passed plus failed. It is not how much of DynamoDB the target covers. A target that implements a thin slice and gets it right will score highly, which is why every score travels with a coverage figure: the operations implemented out of the total. Correctness tells you whether what it does is right; coverage tells you how much it does. Read both, or a narrow surface looks like broad conformance.

Skips are scope, not failure. A skipped test is the target's own feature-probe declining to run because it doesn't implement that operation at all. That's kept out of the score and reported separately. A fail means the operation is there and behaves differently from real DynamoDB, and that counts. They mean opposite things, so don't fold skips into a pass rate.

There are three tiers - Core, Complete and Strict - and one total hides too much. "100% Core, 95% Complete, 80% Strict" tells you far more than "92%". If a user only needs everyday CRUD, the Core score is the one that matters; if they assert on error behaviour in CI, Strict is where a gap bites. Read the tier that maps to what they actually do.

DynamoDB sits at the top of every table at a flat 100%. That's the baseline, not a competitor that happened to win: it's the thing everything else is measured against, so it's 100% by definition.

What the numbers don't tell you

A score is tied to a target version, tested on a date, against DynamoDB's behaviour in eu-west-2 on that date. DynamoDB is neither identical across regions nor fixed over time, so a figure here means conformance to a named region on a named date, nothing wider. Both sides move.

And it's behaviour only. The suite says nothing about performance, scalability, durability, cost, or operational fit. A target can match DynamoDB's behaviour perfectly and still be the wrong tool for a job, or the right one despite a lower score. The methodology has the full limitations.

Comparing on a capability

If a decision hangs on a specific feature - PartiQL, transactions, GSIs, LSIs, streams, TTL - don't read off the total. The capabilities page lays out every target against the same capability columns, and the same data is in the capabilities array for each target in /data/latest.json. Pull the column for the feature you care about and read every target's state on it. The suite scores each target against real DynamoDB, never against each other, so the comparison is like-for-like.

The site won't tell you which target to pick. It gives you the evidence per target, on equal terms.

Who maintains this

The suite and this site are built and maintained by Martin Hicks, who also maintains Dynoxide, one of the targets scored here. That relationship is why nothing on the site is hand-authored: every figure is derived from the suite's own published results at build time, and the scoring logic is pinned to the suite's own by a test that fails the build if the two ever diverge. A target's score here can't be tuned without changing the suite's published results first, in the open. Real DynamoDB is the baseline, every figure carries the region and date it was measured, and suggesting a target is an open GitHub issue away.