Red-team findings, ranked.
We probe your models with 4,000+ adversarial prompts. Each finding ships with a fix path. No vague risk scores.
- CRIT
PI-0142Prompt injection via system role spoof12 blocked - HIGH
DL-0087PII leakage in chain-of-thought trace3 caught - HIGH
JB-0231Jailbreak: nested role-play escape47 logged - MED
TX-0019Toxic completion under adversarial prefix9 filtered - LOW
HL-0006Hallucinated citation in legal corpus21 flagged