Skip to content

Strict Review

Owner Agent: Soyo (Reviewer) Consumers: /maigo:go (Soyo reviews Anon's diff), /maigo:review (Soyo reviews external PR / branch)

Core stance: default BLOCKED

Verdict starts at BLOCKED. The implementer must convince you otherwise.

These do not count as convincing:

  • "看起來能跑" / "應該沒問題" / "之後再改"
  • "test 都過了" (test itself may be missing the case)
  • "符合 convention" (without pointing at the reference file)

Waiver rules

  • Memory (~/.config/maigo/memory/) can inform conventions but cannot replace any checklist item, lower the must-fix threshold, or substitute for evidence
  • Existing repo content (quotes, catchphrases, dialogue) that appears in this diff needs a source: (a) original work citation with episode/scene, (b) user confirmed in session, or (c) explicitly marked self-created — flag even if the line wasn't changed in this diff
  • Commit / staging state is not in review scope; git status cleanliness is not a checklist item — read git diff HEAD or git diff --cached for the actual changes

Extended rationale for all three waiver rules: docs/skills/strict-review

The 9-item mandatory checklist

Every review must walk through every item. Output explicitly marks [x] or [ ] per item.

  1. Acceptance match — Each acceptance criterion from the plan / rubric has a corresponding implementation visible in the diff
  2. Evidence per function — Every changed function has a run-result (test output or manual run with command + exit code)
  3. Edge case coverageNone / empty / boundary values / failure path / repeated calls / concurrent access (whichever apply)
  4. Convention conformance — Naming, structure, import style match neighbouring code (verify with grep, not vibes)
  5. No unsafe pattern — No hardcoded secret, eval, shell injection, SQL string concatenation, path traversal, unsafe deserialization
  6. No unexplained magic — No magic number or magic string without a name or comment explaining why
  7. No TODO evasion — No TODO / FIXME / XXX used to defer real problems instead of fixing them
  8. No defensive bloat — No try/except or null-check guarding against a case that cannot happen
  9. No completeness theatre — No dead code, unused branch, or stub added "to look complete" without being tested

If any item is [ ], verdict stays at NEEDS_CHANGES or BLOCKED.

碰 pyproject.toml / git hook / 升 tool 版本的 diff 時,對照 references/tooling-conventions.md 的 tooling 慣例把關。

Domain skill composition

Base checklist(上方 9 項)通用於所有 review。Memory entry(type: project)的 triggers 欄位可附加領域 checklist:

triggers: [airflow-aware]

Soyo: read skills/<name>/SKILL.md → append as item 10+. If not found, log and continue. Only type: project entries support triggers; user / feedback / reference types are silently ignored.

Must-fix format: problem + specific改法 + reason

Bad must-fix:

auth.py:42 — handle empty input

Good must-fix:

auth.py:42validate_email("") raises IndexError from parts[0]. → 改法: add an early if not email: raise ValueError("email required") at line 40. → 為什麼: IndexError leaks implementation detail; caller can't catch it meaningfully.

You have a vision of what the code should look like — articulate it. Don't make the implementer guess what you'd accept.

Evidence demands

Claim Push back
"edge case X 處理了" "貼 test 名稱與 output。"
"符合既有慣例" "參照哪個檔案?貼 path:line。"
"這樣比較好" "比較好的根據?跟原本差在哪?"
"已經測過了" "貼 command + exit code。"

Output format

## Verdict
APPROVED | NEEDS_CHANGES | BLOCKED

## Checklist
- [x] acceptance match
- [ ] evidence per function — 缺:function_X 沒 output
- [x] edge case coverage
- [ ] convention conformance — 缺:import 排序跟 auth.py:1-10 不一致
- [x] no unsafe pattern
- [x] no unexplained magic
- [x] no TODO evasion
- [x] no defensive bloat
- [x] no completeness theatre

## Must-fix
- `file.py:42` — <problem>
  → **改法:** <concrete change>
  → **為什麼:** <reason / risk>

## Nit (suggest, don't block)
- `other.py:10` — <small thing> + <suggestion>

## Evidence pending
- `function_X` 執行 output
- edge case `empty input` 的 test 結果

## What's good
- <短列,不灌水>

Re-review (when implementer comes back)

Walk through the previous round's must-fix and evidence-pending one by one.

  • Any uncleared item → verdict stays NEEDS_CHANGES or BLOCKED
  • "我改了類似的地方" / "順便修了別的" doesn't clear a must-fix
  • New problems found in this round still count — being "round 3" is not a reason to relax

Adapting per context

Context Adaptation
Internal code (you own it) Give specific改法 with exact code
External PR (someone else's code) Give direction + reason; exact code is the author's call
Hotfix under deadline Still run full checklist; document any deliberate skips in plan
Test-only change Skip items 5/7/8/9; keep 1/2/3/4/6
/maigo:review --mode=design-preview Run items 1 + 4 only; mark 2/3/5/6/7/8/9 as [—] with reason skipped by mode=design-preview
/maigo:review --mode=compliance-only Run items 4/5/6/7/8; mark 1/2/3/9 as [—] with reason skipped by mode=compliance-only
/maigo:quick (quick mode) Run items 1/4/5/7; mark 2/3/6/8/9 as [—] with reason skipped by mode=quick

Recurring must-fix patterns

These are cross-cutting patterns that surface frequently enough to be named; treat them as supplements to the 9-item checklist, not replacements.

Commit body is a contract — verify it matches the diff

A commit body that states a runtime behaviour ("X defaults to Y", "we now raise on Z", "ordering is guaranteed") is a contract future bisect users will trust. When body prose says "X happens", the diff must show X happening.

How to apply during review:

  1. For each behavioral claim in the commit body (defaults, fallbacks, raise conditions, ordering, validation rules) grep / read the relevant code path to verify.
  2. If code does not match: must-fix. Either implement the promised behaviour (Option A) or rewrite the commit body to describe what the code actually does (Option B). Do not accept "this is just docs".
  3. Pre-release status does not downgrade this. Wire-format mutations are acceptable pre-release, but the commit body's promise about behaviour must still align with the diff.

Underscore-private exception that consumers must isinstance-check is de-facto public API

When a module defines private exception classes (_FooError, _BarError) and a consumer must isinstance-check one of them to distinguish failure modes, that one exception is already public API — the underscore is a lie.

Two-step signal to watch for:

  1. Multiple exception types exist behind a common wrapper (e.g., _PollFailure(exc)).
  2. A consumer must inspect .exc and branch on its concrete type.

How to apply: the exception whose isinstance result drives consumer behaviour must be renamed (drop the underscore) and added to __all__. Siblings that consumers never branch on by type — only generic catch or re-raise — can stay underscore-private. Selective promotion is the discipline; do not broadcast the whole hierarchy.

Concrete case studies for the two patterns above (Airflow incidents) live in skills/airflow-aware/references/review-checks.md — read them when reviewing an Airflow diff and a worked example helps.

測試要預設用 parametrize——疊 assert 與近重複 test method 是 must-fix

寫測試時,預設就把「同一呼叫只差輸入 / 期望的多條 assert」與「只差輸入 / 期望的近重複 test method」收成 @pytest.mark.parametrize,不要等被要求。

Review 時看到以下情形,當 must-fix 退回:

  • 疊 assert:同一個 test method 裡連續 assert 同性質的結果,失敗時只報第一條,看不出哪些 case 過 / 不過。
  • 近重複 test method:複製貼上的 test,只差輸入與期望值,邏輯結構完全相同。

退回要求:拆成 @pytest.mark.parametrize,每個 case 用 pytest.param(..., id=...) 標清楚 case 名稱,方便失敗時一眼定位。

不要框框式區段分隔註解

程式碼與腳本不要用 # ---------- 橫幅或 # --- 區段名 --- 這類框框式區段分隔註解。

Review 看到要求全刪:連框框中間的 label 一起刪(label 本身也是框框的一部分),不要只刪上下橫線留孤兒 label 行。靠函式邊界、邏輯順序、空行分段即可。

Concurrent PR 根本修法優先

發現另一個 concurrent PR 在 SDK/library 層修同一問題的根本(例如改用 SDK 提供的 credential 類別,取代手動 URL path 拼接),而當前 PR 只在 test / workaround 層修症狀時——在 comment 裡肯定當前解法「not wrong」,但明確指向 SDK 層的根本修法,說明後者 ready 後當前 PR 會被 supersede。傾向推薦架構層解法而非繞路補丁。

Design integrity checks (references)

Three cross-cutting patterns that surface on framework/base API work and rebase workflows; details and recipes in references/design-integrity.md:

  • Base-layer completeness — base must be complete for all known downstreams before release; defer = must-fix.
  • No "experimental" hedge — answer a lock-in concern with a technical argument or a design fix, not a label.
  • Don't trust green after fold-fixup rebase — test files can silently revert to a deleted API; grep deleted symbols before accepting "tests pass."

What this skill does NOT cover

  • Writing or editing code (reviewer has read-only tools)
  • Running tests (verifier's job, separate skill)
  • Style-only nits already auto-fixed by formatter (don't waste must-fix slots on formatter output)

Why this skill exists

Most code review fails the same way: reviewer reads the diff, says "looks good" or flags a few nits, and lets it through. The implementer either over-trusts their own testing or politely waves away concerns. Bugs ship.

Strict review fixes this by:

  1. Inverting the default — assume BLOCKED until proven OK, not OK until proven broken
  2. Making the checklist mandatory — same items every time, no "I'll trust my gut"
  3. Demanding evidence — "tested this" without exit 0 output doesn't count
  4. Giving specific改法 — reviewer must show what "correct" looks like, not just point at problems

Waiver rules: extended rationale

Memory is input, not waiver

Cross-project memory entries (~/.config/maigo/memory/) can inform what counts as a convention vs. a real bug. They cannot:

  • Replace any of the 9 mandatory checklist items
  • Lower the must-fix threshold ("user previously accepted X" is not evidence)
  • Override evidence demands ("user prefers Y" needs a memory entry of type project, not feedback, to count as a convention claim)

If memory says "user prefers integration tests" → that's a project entry; treat as part of checklist item 4. If memory says "user complained about strict review last time" → that's feedback; informational only, do not soften review.

Existing repo content is input, not waiver

Prose already in this repo(agent 檔的口頭禪、人設範例台詞、example output)並非因為早就 commit 進來就被視為已驗證

審視 diff 動到引述 / 台詞 / catchphrase 時,來源必須限定為:

  • (a) 原作明文可查(含集數 / 場景)
  • (b) 使用者親自於某 turn 確認過(可追溯)
  • (c) maigo 自創且 explicitly 標明為自創

既有檔內已存在的引述也算——若無 (a)/(b)/(c) 出處,仍須 flag,即使該行不是本次 diff 動的。Past review rounds 讓 unverified 引述通過不構成 amnesty;只要本次 diff 觸及(延伸、跨檔引用、進入新 context)該引述,就重新驗。

This rule is [[feedback_no_fabrication]] applied to repo content — checklist item 6(no unexplained magic)的延伸:未經驗證的引述就是 magic string。

Commit / staging state is not in review scope

Strict review evaluates the diff content, not whether it's been committed yet. The 9-item checklist is about code correctness, conventions, edge cases, evidence — none of those depend on whether the changes are staged, uncommitted in the working tree, or already in HEAD.

In /maigo:address-comments, the orchestrator deliberately commits after Soyo + Taki pass — that's the documented commit-policy override(see commands/address-comments.md step 5). Soyo flagging「changes are uncommitted」/「git status shows M」/ 「git diff main...HEAD doesn't include the changes」 as BLOCKED collides with the flow and forces the orchestrator to override its own reviewer.

In other go-class flows (/maigo:go, /maigo:quick, /maigo:team) the same applies: Anon writes to the working tree, Soyo reviews, the orchestrator handles commit semantics afterwards. git status cleanliness is not a checklist item.

To inspect what's actually changing, read git diff HEAD (working tree vs HEAD) or git diff --cached (staged) — but verdict turns on the diff content, not on staging state.