Skip to content
Panshi
EN / /
Relay board Panshi Sentinel · scoring method, in the open

How we grade a relay

Fully transparent, explainable scoring — no black box. Every grade and every verdict label has a defined meaning. We keep detection internals (probes & fingerprints) secret to prevent evasion, but the scoring logic itself is fully public.

Certification grades

🛡️ Panshi Certified A

Mainstream models verified with strong evidence (trust ≥ 90, incl. crypto signature or official-cloud confirmation); no substitution or downgrade.

Verified B

Mainstream models are genuine (trust ≥ 75); no substitution or downgrade.

🟡 Partial C

Some models unconfirmed, or same-vendor downgrade suspected.

Not certified

At least one mainstream model failed verification. We publish only positive certification — those that do not pass are simply "not certified"; we make no public substitution claims.

Capping rule: a few genuine models can never mask a substituted one

Any cross-vendor mismatch (substitution) on a single mainstream model caps the whole relay to "not certified", however genuine the others are; any same-vendor downgrade caps it to B. Flagship models also carry more weight, so a relay cannot inflate its score with cheap genuine models while substituting a flagship.

Trust index (0–100)

The relay-level trust index is a weighted average of per-model trust scores (flagships like opus / gpt-5 weigh more than cheap tiers), counting only models actually tested. Models outside our reference set are excluded and labelled separately — they neither help nor hurt the score.

Per-model verdict labels

Each model is judged independently, with a 0–100 trust score and a clearly defined label:

100 Cryptographically verified Native signature passes official replay — strongest evidence.
92 Genuine · official-cloud resale Genuine model resold via Bedrock / Vertex official cloud — no native signature, but channel fingerprint + multiple signals agree.
90 Behavioral-fingerprint verified High-confidence behavioral-fingerprint match to the official-source reference.
85 Multi-signal verified Multiple independent signals agree (unsigned).
75 Genuine · unsigned Behaviorally genuine, but without cryptographic-grade evidence.
50 Unconfirmed Insufficient signal to reach a confident verdict.
30 Same-vendor downgrade suspected Appears swapped to a cheaper same-vendor tier.
10 Failed verification Behaves like a different (cross-vendor) model.
5 Signature rejected Claimed signature failed official verification.
Not yet covered (excluded) Outside our reference set — no verdict, excluded from the score.

How we test

1

Behavioral fingerprinting

A set of probes samples the model's answer-style distribution and compares it against our official-source reference fingerprints — identifying which model it behaves like, even when prompted to disguise itself as Claude.

2

Cryptographic signature / official replay

Models supporting native signatures are verified via official replay; genuine models resold through Bedrock / Vertex official cloud lack native signatures but are cross-confirmed via channel fingerprint and multiple signals.

3

Multi-signal cross-check + per-model verdict

High confidence requires several independent signals (identity, latency, capability, rank tests) to agree. Each model is judged independently — one verified model never speaks for the whole relay.

Why you can trust it

Per-model verdicts

Within one relay, claude may be genuine while gpt is swapped — each model is judged on its own; one verified model never speaks for the whole relay.

Honest boundaries

Models we cannot cover are marked "not yet covered" — no guessing, no false accusations; same-vendor downgrade uses double-guard thresholds against false positives.

Positive-only certification

We publish only verification / certification. Poorly-performing relays simply lack a badge, rank lower, or drop off — we never publish negative accusations (brand & legal safety).

⚠️ Results are probabilistic signals, not legal proof. This certification is a point-in-time snapshot; a relay's backend may change at any time, and continuous assurance requires paid monitoring. Models outside our reference set are marked "not yet covered" with no verdict. We only publish positive verification — those that do not pass are simply "not certified".

Verify it yourself / monitor continuously

← Browse the relay board