Skip to content

Run

7 emulators 699 tests

Results as of this run. The arrow shows each target's movement since the previous run it was tested in. The suite grew this run, so a downward arrow can be the new tests biting rather than a target regressing.

Suite grew from 684 to 699 tests this run.

That's 15 new tests measured against every target. Movement below compares to the previous run, so a fall here is as likely to be the stricter suite as a real regression.

  1. live (AWS) · full coverage
    100% ground truth
    Tier 1 100%
    Tier 2 100%
    Tier 3 100%
  2. 0.10.0 · 14 unsupported
    100.0% +4.5pp rose 4.5 percentage points
    Tier 1 100.0%
    Tier 2 100.0%
    Tier 3 100.0%
  3. v0.1.0 · 42 unsupported
    93.8% +0.1pp rose 0.1 percentage points
    Tier 1 93.2%
    Tier 2 93.3%
    Tier 3 94.9%
  4. 67825c62ff44 · 9 unsupported
    92.6% -0.9pp fell 0.9 percentage points
    Tier 1 93.5%
    Tier 2 87.8%
    Tier 3 93.9%
  5. 2026.5.1 · 8 unsupported
    88.3% +0.3pp rose 0.3 percentage points
    Tier 1 98.6%
    Tier 2 92.7%
    Tier 3 68.7%
  6. d89f8fcc6b1a · 13 unsupported
    87.0% 0.0pp unchanged
    Tier 1 97.2%
    Tier 2 89.1%
    Tier 3 69.2%
  7. b852d7c01e53 · full coverage
    87.0% +21.8pp rose 21.8 percentage points
    Tier 1 95.8%
    Tier 2 87.9%
    Tier 3 72.0%
  8. 4.0.0 · 67 unsupported
    83.1% +0.2pp rose 0.2 percentage points
    Tier 1 95.5%
    Tier 2 16.9%
    Tier 3 82.7%