QA Scenarios (Pack Self‑Test)
Version: v1.0
Purpose: run the pack end‑to‑end against representative scenarios to find gaps.
How to use:
- Pick scenario.
- Classify in
02a-ai-use-case-matrix.md. - Create a Use‑Case Card (
07-use-case-card-template.md). - Add/update risk register entries (
03-people-harm-risk-register.md). - Confirm controls are sufficient (
04-controls-map.md) and appear in 30‑day plan (05-30-day-implementation-checklist.md).
Scenario A — HR screening / candidate ranking
Description: HR wants an LLM to score/rank candidates and draft interview recommendations.
Expected classification: D3 (personal data) / O0 / C2 → Prohibited (default).
Checks:
- Policy explicitly prohibits automated HR decisions by default (C‑H2).
- Governance path for exceptions is documented (C‑G1) and requires strict audit trail (C‑L2) and bias testing (C‑Q2).
- Training deck includes this example and makes the “default prohibited” clear.
Common gaps to look for:
- “But it’s only a recommendation” ambiguity.
- Missing contestability / appeal route.
- Missing documentation of decision rationale.
Scenario B — Data leak via unapproved tool
Description: A staff member pastes a customer ticket containing phone number + address into a public chatbot to draft a reply.
Expected classification: D3 / O1 / C1 → Prohibited (default) (unless the tool is explicitly approved for restricted data with DLP and privacy review).
Checks:
- Policy: clear “never paste restricted data into unapproved tools.”
- Controls: C‑D1/C‑D2 + DLP/secret guidance + incident reporting (C‑I1).
- 30‑day plan includes: approved tools register, DLP guidance/blocks, training + quiz.
Common gaps to look for:
- Not defining “Restricted” in plain terms (PII + secrets).
- No near‑miss reporting for copy/paste events.
Scenario C — Hallucinated guidance causes customer harm
Description: Support uses AI to draft a response; model confidently suggests wrong troubleshooting steps, causing a customer outage.
Expected classification: D1–D2 / O1 / C1 → Conditional.
Checks:
- HITL required before sending (C‑H1).
- QA sampling and hallucination monitoring (C‑Q1) exists with metrics.
- Grounding rules exist (e.g., “must cite KB; if unsure, escalate”).
- Incident response includes “AI incident” definition and post‑incident review (C‑I1/C‑I2).
Common gaps to look for:
- No measurable threshold for unacceptable hallucination rate.
- No kill‑switch / rollback runbook (C‑I3) or unclear ownership.
- No feedback loop to improve KB/prompting.