DD6 structures the work. CIRK governs the execution.
A proposed standard that classifies problem complexity across six dimensions to determine how much discovery is needed before an AI agent touches it.
Intake: add multi-tenant isolation to the API
I: 2 | D: 3 | S: 2 | T: 2 | P: 3 | B: 2
→ Score 14 → Standard Discovery (2–4 structured sessions)
Understand the problem before the agent touches it.
One-size-fits-all asks
"Same process for everything?"
DD6 asks
"How much does this problem need?"
The shift
AI agents can write code in seconds. But they cannot understand vague problems. What matters now is not how fast the code is written. It is:
DD6 classifies that reality.
Discovery is not overhead. It is error reduction.
The quality of the input determines the quality of the output.
The model
Each intake is scored from 1 to 3 across six dimensions. The vector defines how much discovery is needed.
How clear is what the requester actually wants?
| I1 | Crystal clear, spec-ready |
| I2 | Partially clear — goal understood but specifics need refinement |
| I3 | Vague or ambiguous — cannot articulate the desired outcome |
How much specialist knowledge is needed?
| D1 | Standard domain — general engineering knowledge sufficient |
| D2 | Moderate specialization — specific business rules or patterns |
| D3 | Deep expertise — security, compliance, performance, domain science |
How many perspectives need to align?
| S1 | Single stakeholder defines the problem |
| S2 | Two to three stakeholders — engineering + product or design |
| S3 | Multiple stakeholders — cross-functional alignment required |
Can an AI agent verify the outcome is correct?
| T1 | Easily testable — clear metrics, deterministic outcomes |
| T2 | Partially testable — some aspects require judgment |
| T3 | Hard to test — subjective outcomes, requires human evaluation |
Have we done something similar before?
| P1 | Strong precedent — many similar examples, established patterns |
| P2 | Partial precedent — related work exists but new aspects |
| P3 | No precedent — completely new territory |
Are the boundaries of the problem clear?
| B1 | Clear boundaries — well-defined scope, explicit in/out |
| B2 | Partial boundaries — general scope understood, edges fuzzy |
| B3 | Unclear — open-ended, scope is part of what needs discovery |
Composite score = I + D + S + T + P + B
"Score the discovery reality, not the political preference."
DD6 Scoring Guidance
What DD6 is not
DD6 governs how much discovery to do — not the discovery itself.
DD6 in 60 seconds
Any unit of work entering your backlog: a feature, a bug, a refactor, a migration.
Assign I, D, S, T, P, B values from 1 to 3. Ask: how clear? how deep? how many stakeholders? how testable? how precedented? how bounded?
The vector maps to a discovery mode — skip, focused, structured, iterative, or stabilize. The depth is proportional to the complexity.
Teams gain a shared language: "This is high D." "Low P, we need exploration." "I3 — clarify intent first."
Rule of thumb
Discovery policies
The vector defines what happens — not just how complex the problem is.
No discovery session needed. Generate spec directly from intake using a template.
1–2 focused sessions. Clarify the dominant dimension. Structured Q&A, not open-ended exploration.
2–4 sessions with defined phases. Intent → scope → risk progression. Exit criteria per phase.
4+ sessions. Hypothesis testing and validation. Architectural exploration. Multi-stakeholder workshops.
Human-led stabilization. No structured discovery during crisis. Triage → stabilize → then classify.
Policy rules by dimension signal
Examples
Each vector maps to a concrete discovery depth, with reasoning for each dimension.
Add dark mode toggle to settings page
Add real-time notifications for governance violations
Intermittent data corruption in batch processing
Design principles
MIT licensed. No vendor lock-in. Any team, tool, or platform can implement it.
Six dimensions, scored 1–3. No certifications, no training required.
Discovery depth matches problem complexity. Simple problems skip. Complex problems get what they need.
Works standalone or as upstream input to CIRK. Problem → DD6 → Discovery → Spec → CIRK → Execution.
FAQ
No. DD6 governs how much discovery to do. It does not replace the sessions themselves.
Not directly. DD6 maps to discovery depth (number of sessions), which has time implications. But duration is derived from depth, not scored directly.
Yes. DD6 is a standalone standard for problem classification. However, the full value emerges when DD6 (upstream) feeds into CIRK (downstream) as a unified framework.
No. Orbit618 is one possible implementation environment for DD6, but DD6 is designed as a standalone open standard.
No. DD6 is a standard and a shared classification language. Products and platforms may implement it, but the model itself is implementation-agnostic.
CIRK has four dimensions because execution risk can be captured in four independent axes. Problem complexity requires more axes because the problem space has more independent variables: intent, domain, stakeholders, testability, precedent, and boundaries each vary independently and affect discovery depth differently.
Cynefin classifies problems into domains (Clear, Complicated, Complex, Chaotic) but does not provide a scoring model or dimensions specific to software. DD6 uses Cynefin's domains as depth labels but adds the six-dimensional scoring that makes classification systematic and auditable.
Yes. DD6 supports heuristic scoring (rules-based) and AI-assisted scoring (a model evaluates the intake and assigns scores). AI-assisted scoring is especially useful for high-volume intake triage. See scoring guidance for practical advice and examples.
It depends on your context. For teams with many stakeholders, S is often the strongest signal. For teams with novel domains, D and P dominate. For vague requirements, I and B matter most.
Open questions
These are tensions we are still debating. If you have answers or counterexamples, we want to hear them.
Should DD6 include a "risk" dimension?
DD6 measures problem complexity, not execution risk (that is CIRK). But some teams argue that risk awareness during discovery changes how much exploration is needed. Should risk be a seventh dimension, or does CIRK already cover it?
Can AI reliably score DD6 without human calibration?
AI-assisted scoring works for high-volume triage, but accuracy depends on context the model may not have. Is human calibration always needed for the first pass, or can teams trust AI scoring from day one?
How does DD6 interact with existing triage processes?
Most teams already have intake workflows: Jira triage, backlog grooming, planning poker. DD6 adds a classification layer. Can it slot in without disrupting those flows, or does it require process change?
Have a perspective? Join the discussion on GitHub →
MIT licensed. No dependencies. Works with anything. We want to see where it fails.