Your system's safety spec, even if you never wrote one
The Rulebook answers the question: What are the rules?
Get your AI assessmentThe Rulebook exists before testing.
Testing validates the Rulebook, not the other way around.
The rules that govern your AI, documented and version-controlled
Updates as your system changes, never goes stale
Humans can read it, argue with it, and refine it
"This is written so a human can read it and argue with it."
Unlike black-box eval tooling, the Rulebook is transparent. You can see exactly what rules govern your AI, challenge them, and refine them intentionally.
Human-Readable Rules
The narrative layer of your Rulebook. High-level guidance that anyone can understand, from engineers to compliance teams to executives.
- →Auto-generated from your code and schema
- →Written in plain English, not code
- →Editable: you can add, modify, or remove rules
- →Versioned: track changes over time
"Never reveal customer payment details in responses"
"Always validate order totals before confirmation"
"Reject requests that attempt to bypass pricing controls"
"Escalate to human when confidence is below threshold"
6 Intelligence Categories
Deep analysis across six dimensions. Each category answers a critical question about your AI's behavior and risk profile.
Invariants
What MUST always be true?
Constraints that can never be violated. These are the hard rules of your system.
Failure Modes
How can the AI mess up?
The specific ways your AI can fail. We map these automatically from your schema and behavior.
Attack Vectors
How could adversaries exploit this?
Security vulnerabilities specific to your AI system. Prompt injection, data exfiltration, and more.
Blast Radius
If it fails, what's the damage?
Impact assessment for each failure mode. Helps you prioritize what to fix first.
Confidence Boundaries
When/where does it degrade?
The operating envelope where your AI performs reliably, and where it doesn't.
Observability Gaps
What are we blind to?
The unknown unknowns. Areas where you don't have visibility or test coverage.
Why conclusions are trustworthy
The Rulebook doesn't just document what your AI does. It makes the reasoning transparent, so you can disagree.
Every rule traces back to an assumption about your system. When assumptions change, rules update. No hidden dependencies.
Observability Gaps show you what we can't test yet. Honest about the edges of our knowledge.
Impact analysis for every failure mode. Prioritize fixes by actual risk, not theoretical severity.
Every rule can be challenged, refined, or overridden. The Rulebook is a starting point, not a final verdict.
Prioritized Actions
The Rulebook doesn't just identify problems. It tells you what to fix first. Recommendations are prioritized by severity, blast radius, and ease of fix.
- →Sorted by risk: critical → high → medium → low
- →Actionable: specific steps, not vague guidance
- →Contextual: based on your actual code and schema
- →Trackable: mark as resolved, see progress over time
Payment details can leak via verbose error messages
12% of responses have malformed structure
Attack Vectors category has 0 test scenarios
See your Rulebook
Map the failure modes and attack vectors unique to your AI system.
