2026-06-22

Good Prompts Carry Both a Diagnostic Table and a Rubric

A reading of 42 PCH-Optimizer catalog items about diagnostics, verification, and anti-patterns as a way to turn prompt improvement into something judgeable.

PCH-Optimizer diagnostic rubric sketch cover

If we say a prompt got better, we need evidence. It is not better merely because it became longer. It is not better merely because it sounds more engineered. The most practical part of the PCH-Optimizer catalog is this move: it turns improvement into diagnostic tables and verification rubrics.

This note covers 42 of the 115 catalog items.

Bundle	Items	Role
Prompt Layer Diagnostic	8	Find defects in the instruction string
Context Layer Diagnostic	8	Find defects in the information shown to the model
Harness Layer Diagnostic	8	Find defects in tools, loops, state, and evaluation
Verification & Evaluation	10	Judge whether the upgrade worked
Anti-Pattern Guardrail	8	Block common ways prompts decay
Total	42	The diagnostic and judgment layer

Diagnosis Is Not Listing

The PCH-Optimizer diagnostic table is not a simple checklist. Each row asks for current state, defect or risk, and improvement direction.

1	current state -> defect/risk -> improvement direction

Current state alone is observation. Defect alone is criticism. Improvement direction alone is a prescription without enough evidence. The three columns together create diagnosis.

“No output format” is not yet a full diagnosis. A stronger row looks like this.

Current state	Defect/risk	Improvement direction
The answer is free-form prose	It is hard to post-process, compare, or detect omissions	Use a table or JSON schema with required fields

Now the upgrade path is visible.

Prompt Diagnostics: The Sentence Failure

The eight Prompt Layer Diagnostic items examine objective, role, instruction clarity, output contract, examples, reasoning scaffold, failure handling, and structure. This is the most familiar part of prompt engineering.

Even here, the point is not to make the prompt prettier. The question is whether the objective is clear, the role is at the right altitude, the instruction is positive rather than mostly prohibitive, the output contract exists, and the examples are representative.

Useful questions include:

Can the prompt’s objective be stated in one sentence?
Is the role only as specific as needed?
Are desired actions clearer than prohibitions?
Can a person or program verify the output?
Are examples canonical rather than exhaustive?

Prompt diagnosis looks like sentence editing, but it is really about making the output contract visible.

Context Diagnostics: What the Model Sees

The Context Layer Diagnostic items are less visible and often more important. They examine required information, ordering, retrieval strategy, grounding and citation, memory, token budget, context-rot risk, and redundancy to cut.

A strong prompt with weak context produces average answers. A fairly plain prompt with well-selected context can perform surprisingly well.

The blog idea here is not “how to give the model more context.” It is “how to remove context well.”

Question	Reason
What behavior breaks if this information is removed?	To keep only high-signal tokens
Is the same fact repeated?	To reclaim token budget
Is key information buried in the middle?	To reduce recall loss in long context
What should be retrieved just in time instead of preloaded?	To separate JIT retrieval from stuffing

Context engineering is not generosity. It is editing.

Harness Diagnostics: Runtime Failure

Harness Layer Diagnostic catches failures that cannot be seen by reading the prompt sentence alone: tool set, agent loop, state artifacts, self-verification, context management, init/executor split, failure-mode mitigation, and evals.

If an agent keeps declaring work complete too early, adding “finish the job” to the prompt is not enough. We need completion criteria, checkpoints, test commands, progress files, and retry policy. That is a harness problem.

Harness diagnosis asks:

Are the available tools minimal and unambiguous?
Does the loop follow gather, plan, act, verify?
Is state stored outside the chat as a file or structured artifact?
Is completion decided by verification rather than model confidence?
Are failures recorded so the same mistake is not repeated?

At this layer, prompt engineering starts to look like software operations.

Verification Splits the Vibe Apart

The ten Verification & Evaluation items split “good answer” into judgeable pieces. They include happy path, edge case, adversarial, and regression tests, plus rubrics for schema adherence, instruction fidelity, hallucination control, and termination accuracy.

The move is to turn each standard into a measurement method and passing line.

Vague criterion	Operational criterion
It is relevant	It reflects all 5 of 5 required inputs
It is consistent	All output fields match the schema
It is well grounded	Every factual claim has a source or is marked unsupported
It completed the work	Test command passed and open items are zero

This moves prompt improvement out of taste debate.

Anti-Patterns Are Direction Changes, Not Just Bans

The eight Anti-Pattern Guardrail items cover edge-case stuffing, over-prescriptive rules, ambiguous tool additions, bloated context, invented personas, duplicated sections, vibe evaluation, and missing state artifacts.

The important move is not just “do not do this.” Each anti-pattern needs a replacement behavior.

Anti-pattern	Replacement behavior
Exhaustive edge-case stuffing	Curate canonical examples
Ambiguous tool additions	Minimize tools and state when to use each
Vibe evaluation	Use pass/fail rubrics
Missing state artifacts	Keep progress files, checkpoints, and regression tests

Prohibition leaves a blank. Replacement behavior gives the model the next foothold.

Prompt Idea from This Bundle

Do not rewrite my prompt immediately.
First diagnose it across Prompt, Context, and Harness.
For each layer, produce a table with current state, defect/risk, and improvement direction.
Then improve only the top three critical defects.
End with happy path, edge case, adversarial, and regression tests, plus a pass/fail rubric.

This prompt does not ask for something “more impressive.” It asks for something more judgeable. That is why PCH-Optimizer is useful. It treats prompts not as sentences to polish, but as work artifacts whose failures can be observed and tested.

The next note looks at how the diagnostic results become a technique menu and assembly templates.

AI agent 실습과 학습 노트를 운영 가능한 기록으로 남깁니다.

Good Prompts Carry Both a Diagnostic Table and a Rubric

Diagnosis Is Not Listing

Prompt Diagnostics: The Sentence Failure

Context Diagnostics: What the Model Sees

Harness Diagnostics: Runtime Failure

Verification Splits the Vibe Apart

Anti-Patterns Are Direction Changes, Not Just Bans

Prompt Idea from This Bundle

댓글

Diagnosis Is Not Listing

Prompt Diagnostics: The Sentence Failure

Context Diagnostics: What the Model Sees

Harness Diagnostics: Runtime Failure

Verification Splits the Vibe Apart

Anti-Patterns Are Direction Changes, Not Just Bans

Prompt Idea from This Bundle

더 읽어보기

댓글