AiDOOS Blog : Defining Outcomes for a Virtual Delivery Center: From Spec to Acceptance Criteria

Outcome-based delivery sounds simple until you have to write the outcomes. "Replace the legacy auth system" looks like an outcome but isn't testable. "Implement OAuth2 endpoint X per RFC 6749" is testable but isn't an outcome — it's a task. The space between vague-and-aspirational and granular-and-prescriptive is where good outcomes live, and most VDC engagements that struggle struggle there.

This piece walks through the four-layer structure that turns business intent into testable acceptance criteria, the patterns that work, the patterns that fail, and how those acceptance criteria become the unit that drives Delivery Unit (DU) consumption — closing the loop from business intent to invoice line. A typical VDC engagement only works as outcome-based delivery if these layers are in place.

The four layers of outcome definition

Layer 1: Business intent

Why does this work matter? What does shipping it enable? What's the customer or business problem being solved? This is the layer where executives align. It's deliberately not testable — testability comes lower.

Example: "We want to reduce friction in our user-onboarding flow because conversion is dropping at the auth step."

Layer 2: Engagement scope

What's in scope, what's not, what depends on what. This is the layer where engineering leadership and the pod's tech lead align. Translates business intent into technical boundaries.

Example: "Build a new OAuth2-based auth system that handles email + social login, supports SSO across our existing 4 web apps, and runs alongside the legacy system for 90 days before cutover. Out of scope: mobile auth, MFA enrollment flows, the password-reset UX."

Layer 3: Milestone-level outcomes

Each milestone (typically 2–4 weeks of work) has an outcome statement. Outcome-level, not task-level. Specific enough to be testable; abstracted enough to invite the pod's problem-solving.

Example milestone outcome: "Users can sign in with email + password through the new OAuth2 flow on Web App #1, with sessions persisting for 7 days and logout working cleanly."

Layer 4: Acceptance criteria

The testable conditions for milestone sign-off. Concrete, observable, binary. Either each criterion is met or it isn't.

Example acceptance criteria for the milestone above:

Email + password login flow ships on Web App #1.
Login latency p99 under 300ms in staging environment.
Sessions persist 7 days; expiration handled cleanly.
Logout invalidates session both client- and server-side.
All existing E2E tests pass; 5 new E2E tests for the auth flow pass.
Security review completed; no high-severity findings.
Documentation updated in the auth-system runbook.

This is the layer where milestone sign-off happens. Each acceptance criterion is testable; "did this criterion pass" has a binary answer.

What makes outcomes good

Five tests for a well-defined outcome:

Testable. Can someone besides the pod determine whether it's met? Vague outcomes like "improve performance" fail this test. Specific ones like "p99 latency under 300ms" pass.
Outcome-shaped, not task-shaped. Describes what the work produces, not what the work does. "User auth flow ships" is outcome-shaped. "Implement function X" is task-shaped.
Bounded. Has a clear scope boundary. Open-ended outcomes ("make the auth system better") drift into scope creep.
Independent of implementation choices. Doesn't prescribe how the pod implements. The outcome should be true regardless of which library, pattern, or architecture the pod uses.
Acceptance is binary. No "mostly accepted" or "accepted with reservations." Either all criteria pass or the milestone doesn't sign off.

What makes outcomes bad

Five anti-patterns that produce friction at acceptance time:

Vague. "Improve user experience." Untestable. Disagreements at acceptance are inevitable.
Task-shaped. "Refactor the auth module." Pod ships a refactor; was it the refactor you wanted? Hard to say.
Implementation-prescriptive. "Use Auth0 for OAuth2." Constrains the pod's solution space and conflates "what" with "how."
Multi-criteria with implicit AND. "Ship login flow and improve security and refactor the user model." Three things bundled; if one isn't done, is the milestone done?
Acceptance subjective. "Login experience feels smooth." Whose feeling? Replace with measurable criteria.

The acceptance-criteria writing pattern

Useful template for converting outcomes into acceptance criteria:

Given [precondition], when [action], then [observable result].

Example: "Given a user with valid credentials, when they submit the login form on Web App #1, then they're redirected to the dashboard within 300ms with a valid session cookie set."

This BDD-style pattern (borrowed from behavior-driven development but adapted to milestone acceptance) forces the criteria to be:

Concrete (not "users can log in" but "users with valid credentials submitting the form are redirected to the dashboard").
Testable (someone besides the pod can verify the precondition + action + result).
Independent of implementation (no mention of which auth library, which session pattern).

How outcomes become Delivery Units

Outcome definition isn't just a project-management exercise. It's the input that drives the entire DU pricing system. Every layer above maps to a step in the consumption chain:

Layer 2 (engagement scope) drives initial DU sizing. The Instant Proposal sizes the engagement in DUs — typically a band that lands the engagement in the right tier (Starter / Small / Scale / Enterprise). The customer sees the DU count before authorizing any work.
Layer 3 (milestone outcomes) drives milestone-level DU allocation. Each milestone outcome carries an estimated DU count. Customers see, before the milestone starts, how many DUs that milestone will consume from the wallet on acceptance.
Layer 4 (acceptance criteria) drives actual DU consumption. The customer's wallet only debits when the acceptance criteria are met. Work in progress doesn't consume DUs. Failed-acceptance work doesn't consume DUs. The wallet ticks down only as accepted output accumulates.

This is why outcome definition is mechanically essential — not just operationally helpful. Vague outcomes break the consumption gate. If "is this done" can't be answered binary, the DU primitive can't gate consumption against acceptance, and the engagement collapses back into hourly billing dynamics. Sharp outcomes are the structural prerequisite for outcome-based delivery to work as a pricing model, not just as a delivery method.

The platform-side advantage: failed acceptance triggers re-delivery (the platform pays for the re-do, the customer's wallet stays still). That asymmetry only works if "failed acceptance" is unambiguous. Layer 4 is what makes it unambiguous.

Who writes the outcomes

Layered authorship matches the layered structure:

Layer 1 (business intent): customer's product/business leadership.
Layer 2 (engagement scope): customer's engineering leadership + pod's tech lead, in collaboration. Engagement architect supports the DU sizing.
Layer 3 (milestone outcomes): pod's delivery manager facilitates; pod's tech lead drives technical scope; customer's product owner approves. Pre-flight DU count published per milestone.
Layer 4 (acceptance criteria): customer's product owner writes; pod's tech lead reviews for testability and feasibility. Becomes the binary gate for DU consumption.

The most common failure: customer writes layers 1 and 2 then hands them to the pod and says "figure out the milestones." Layer 3 should be collaborative; the pod has implementation context that informs how to slice the work into shippable units that the DU primitive can size cleanly.

How outcomes evolve during the engagement

Outcomes are written at engagement start but they're not frozen. Three patterns for evolution:

Discovery-driven refinement. The first milestone reveals that the original scope assumed something incorrect. Subsequent milestones recompose to reflect what was learned. DU counts re-size against the new outcome shape; the existing wallet absorbs the shift.
Business-driven scope shift. Customer business priorities change mid-engagement. New work emerges; existing work deprioritizes. Outcomes recompose. Unused DUs from deprioritized work flow into new work without contract amendment.
Implementation-driven scope shift. The pod discovers that an outcome as written would require materially more delivery than originally sized. Engagement decides whether to commit the higher DU count, descope, or alter the outcome.

The platform supports recomposition without contract amendments — that's the elasticity advantage of the model. The DU primitive absorbs scope evolution because the unit isn't engagement-specific; it's universal across milestones, engagements, and tier bands. See VDC contracting / SOW for the contractual mechanics.

The first milestone is special

The first milestone of any engagement should be deliberately conservative — well-bounded, achievable inside one sprint, and chosen to validate the operational rhythm rather than to ship maximum scope. Reasons:

The pod is still ramping. Subscale productivity is normal in the first sprint.
Acceptance pattern is being calibrated. First-milestone sign-off teaches both sides what acceptance feels like — and what DU consumption against acceptance feels like.
Confidence accrues from early hits. A milestone missed early creates lasting friction; a milestone hit early creates trust that compounds.

Don't try to make the first milestone the biggest. Make it the cleanest.

Frequently asked questions

How granular should milestones be?

2–4 weeks of work each. Smaller milestones add ceremony overhead. Larger ones reduce predictability. Engagement-specific calibration usually settles on 3-week milestones with 15–25 DUs of consumption per milestone.

What if scope changes mid-milestone?

Two patterns. Small changes: pod absorbs within the current milestone if feasible — no DU re-size needed. Large changes: pause the milestone, re-size in DUs, restart. Don't quietly drift the scope while the milestone runs — that breaks acceptance and breaks the DU consumption gate.

Who breaks ties when the customer and pod disagree on whether acceptance criteria are met?

The pod's delivery manager facilitates the conversation. If unresolved, escalates to the platform's calibration board. Most disputes resolve at the DM level by clarifying which criterion is contested and what evidence resolves it. The platform errs toward re-delivery rather than disputed sign-off — bounded risk on the platform side, clean experience on the customer side.

Should outcomes include non-functional requirements (performance, security, etc.)?

Yes — bake them into acceptance criteria. "Login latency p99 under 300ms" is a non-functional requirement made testable. Non-functional requirements that aren't in acceptance criteria are likely to be missed, and they'll show up later as "the work was done but it doesn't work right" — exactly the kind of dispute the DU consumption gate is designed to prevent.

Where to start

If you're scoping a new VDC engagement, draft the four layers before kickoff. Bring the layers to the kickoff session for collaborative refinement. The pod's tech lead and DM will catch implementation issues early — better in scoping than at first milestone. The DU pre-flight estimate becomes meaningful only when the layers are tight; rushing the layers to get to a number produces a number you can't trust.

For help structuring outcomes for your specific engagement, schedule a 30-minute call. For the operational context, see onboarding a VDC: 14 days and VDC governance.

Defining Outcomes for a Virtual Delivery Center: From Spec to Acceptance Criteria