How we grade a relay
Fully transparent, explainable scoring — no black box. Every grade and every verdict label has a defined meaning. We keep detection internals (probes & fingerprints) secret to prevent evasion, but the scoring logic itself is fully public.
Certification grades
Mainstream models verified with strong evidence (trust ≥ 90, incl. crypto signature or official-cloud confirmation); no substitution or downgrade.
Mainstream models are genuine (trust ≥ 75); no substitution or downgrade.
Some models unconfirmed, or same-vendor downgrade suspected.
At least one mainstream model failed verification. We publish only positive certification — those that do not pass are simply "not certified"; we make no public substitution claims.
Capping rule: a few genuine models can never mask a substituted one
Any cross-vendor mismatch (substitution) on a single mainstream model caps the whole relay to "not certified", however genuine the others are; any same-vendor downgrade caps it to B. Flagship models also carry more weight, so a relay cannot inflate its score with cheap genuine models while substituting a flagship.
Trust index (0–100)
The relay-level trust index is a weighted average of per-model trust scores (flagships like opus / gpt-5 weigh more than cheap tiers), counting only models actually tested. Models outside our reference set are excluded and labelled separately — they neither help nor hurt the score.
Per-model verdict labels
Each model is judged independently, with a 0–100 trust score and a clearly defined label:
| 100 | Cryptographically verified | Native signature passes official replay — strongest evidence. |
| 92 | Genuine · official-cloud resale | Genuine model resold via Bedrock / Vertex official cloud — no native signature, but channel fingerprint + multiple signals agree. |
| 90 | Behavioral-fingerprint verified | High-confidence behavioral-fingerprint match to the official-source reference. |
| 85 | Multi-signal verified | Multiple independent signals agree (unsigned). |
| 75 | Genuine · unsigned | Behaviorally genuine, but without cryptographic-grade evidence. |
| 50 | Unconfirmed | Insufficient signal to reach a confident verdict. |
| 30 | Same-vendor downgrade suspected | Appears swapped to a cheaper same-vendor tier. |
| 10 | Failed verification | Behaves like a different (cross-vendor) model. |
| 5 | Signature rejected | Claimed signature failed official verification. |
| — | Not yet covered (excluded) | Outside our reference set — no verdict, excluded from the score. |
How we test
Behavioral fingerprinting
A set of probes samples the model's answer-style distribution and compares it against our official-source reference fingerprints — identifying which model it behaves like, even when prompted to disguise itself as Claude.
Cryptographic signature / official replay
Models supporting native signatures are verified via official replay; genuine models resold through Bedrock / Vertex official cloud lack native signatures but are cross-confirmed via channel fingerprint and multiple signals.
Multi-signal cross-check + per-model verdict
High confidence requires several independent signals (identity, latency, capability, rank tests) to agree. Each model is judged independently — one verified model never speaks for the whole relay.
Why you can trust it
Per-model verdicts
Within one relay, claude may be genuine while gpt is swapped — each model is judged on its own; one verified model never speaks for the whole relay.
Honest boundaries
Models we cannot cover are marked "not yet covered" — no guessing, no false accusations; same-vendor downgrade uses double-guard thresholds against false positives.
Positive-only certification
We publish only verification / certification. Poorly-performing relays simply lack a badge, rank lower, or drop off — we never publish negative accusations (brand & legal safety).
⚠️ Results are probabilistic signals, not legal proof. This certification is a point-in-time snapshot; a relay's backend may change at any time, and continuous assurance requires paid monitoring. Models outside our reference set are marked "not yet covered" with no verdict. We only publish positive verification — those that do not pass are simply "not certified".