Skip to content

Run

7 emulators 817 tests

Results as of this run. The arrow shows each target's movement since the previous run it was tested in. The suite grew this run, so a downward arrow can be the new tests biting rather than a target regressing.

Suite grew from 762 to 817 tests this run.

That's 55 new tests measured against every target. Movement below compares to the previous run, so a fall here is as likely to be the stricter suite as a real regression.

  1. live (AWS) full coverage
    100% ground truth
    Tier 1 100%
    Tier 2 100%
    Tier 3 100%
  2. 99.1% -0.6pp fell 0.6 percentage points
    Tier 1 99.5%
    Tier 2 100.0%
    Tier 3 98.0%
  3. 40b3c73db5db full coverage
    89.4% -0.7pp fell 0.7 percentage points
    Tier 1 92.8%
    Tier 2 82.7%
    Tier 3 87.6%
  4. 88.9% -3.2pp fell 3.2 percentage points
    Tier 1 88.2%
    Tier 2 89.8%
    Tier 3 89.6%
  5. 88.2% -0.9pp fell 0.9 percentage points
    Tier 1 94.0%
    Tier 2 79.4%
    Tier 3 83.6%
  6. 83.7% -2.1pp fell 2.1 percentage points
    Tier 1 92.8%
    Tier 2 89.4%
    Tier 3 65.2%
  7. 82.2% -2.3pp fell 2.3 percentage points
    Tier 1 91.4%
    Tier 2 86.1%
    Tier 3 64.8%
  8. 76.4% -0.6pp fell 0.6 percentage points
    Tier 1 91.1%
    Tier 2 13.3%
    Tier 3 72.8%