Good Prompts Carry Both a Diagnostic Table and a Rubric

A reading of 42 PCH-Optimizer catalog items about diagnostics, verification, and anti-patterns as a way to turn prompt improvement into something judgeable.

PCH-Optimizer diagnostic rubric sketch cover

If we say a prompt got better, we need evidence. It is not better merely because it became longer. It is not better merely because it sounds more engineered. The most practical part of the PCH-Optimizer catalog is this move: it turns improvement into diagnostic tables and verification rubrics.

This note covers 42 of the 115 catalog items.

Bundle Items Role
Prompt Layer Diagnostic 8 Find defects in the instruction string
Context Layer Diagnostic 8 Find defects in the information shown to the model
Harness Layer Diagnostic 8 Find defects in tools, loops, state, and evaluation
Verification & Evaluation 10 Judge whether the upgrade worked
Anti-Pattern Guardrail 8 Block common ways prompts decay
Total 42 The diagnostic and judgment layer

Diagnosis Is Not Listing

The PCH-Optimizer diagnostic table is not a simple checklist. Each row asks for current state, defect or risk, and improvement direction.

1
current state -> defect/risk -> improvement direction

Current state alone is observation. Defect alone is criticism. Improvement direction alone is a prescription without enough evidence. The three columns together create diagnosis.

“No output format” is not yet a full diagnosis. A stronger row looks like this.

Current state Defect/risk Improvement direction
The answer is free-form prose It is hard to post-process, compare, or detect omissions Use a table or JSON schema with required fields

Now the upgrade path is visible.

Prompt Diagnostics: The Sentence Failure

The eight Prompt Layer Diagnostic items examine objective, role, instruction clarity, output contract, examples, reasoning scaffold, failure handling, and structure. This is the most familiar part of prompt engineering.

Even here, the point is not to make the prompt prettier. The question is whether the objective is clear, the role is at the right altitude, the instruction is positive rather than mostly prohibitive, the output contract exists, and the examples are representative.

Useful questions include:

  • Can the prompt’s objective be stated in one sentence?
  • Is the role only as specific as needed?
  • Are desired actions clearer than prohibitions?
  • Can a person or program verify the output?
  • Are examples canonical rather than exhaustive?

Prompt diagnosis looks like sentence editing, but it is really about making the output contract visible.

Context Diagnostics: What the Model Sees

The Context Layer Diagnostic items are less visible and often more important. They examine required information, ordering, retrieval strategy, grounding and citation, memory, token budget, context-rot risk, and redundancy to cut.

A strong prompt with weak context produces average answers. A fairly plain prompt with well-selected context can perform surprisingly well.

The blog idea here is not “how to give the model more context.” It is “how to remove context well.”

Question Reason
What behavior breaks if this information is removed? To keep only high-signal tokens
Is the same fact repeated? To reclaim token budget
Is key information buried in the middle? To reduce recall loss in long context
What should be retrieved just in time instead of preloaded? To separate JIT retrieval from stuffing

Context engineering is not generosity. It is editing.

Harness Diagnostics: Runtime Failure

Harness Layer Diagnostic catches failures that cannot be seen by reading the prompt sentence alone: tool set, agent loop, state artifacts, self-verification, context management, init/executor split, failure-mode mitigation, and evals.

If an agent keeps declaring work complete too early, adding “finish the job” to the prompt is not enough. We need completion criteria, checkpoints, test commands, progress files, and retry policy. That is a harness problem.

Harness diagnosis asks:

  • Are the available tools minimal and unambiguous?
  • Does the loop follow gather, plan, act, verify?
  • Is state stored outside the chat as a file or structured artifact?
  • Is completion decided by verification rather than model confidence?
  • Are failures recorded so the same mistake is not repeated?

At this layer, prompt engineering starts to look like software operations.

Verification Splits the Vibe Apart

The ten Verification & Evaluation items split “good answer” into judgeable pieces. They include happy path, edge case, adversarial, and regression tests, plus rubrics for schema adherence, instruction fidelity, hallucination control, and termination accuracy.

The move is to turn each standard into a measurement method and passing line.

Vague criterion Operational criterion
It is relevant It reflects all 5 of 5 required inputs
It is consistent All output fields match the schema
It is well grounded Every factual claim has a source or is marked unsupported
It completed the work Test command passed and open items are zero

This moves prompt improvement out of taste debate.

Anti-Patterns Are Direction Changes, Not Just Bans

The eight Anti-Pattern Guardrail items cover edge-case stuffing, over-prescriptive rules, ambiguous tool additions, bloated context, invented personas, duplicated sections, vibe evaluation, and missing state artifacts.

The important move is not just “do not do this.” Each anti-pattern needs a replacement behavior.

Anti-pattern Replacement behavior
Exhaustive edge-case stuffing Curate canonical examples
Ambiguous tool additions Minimize tools and state when to use each
Vibe evaluation Use pass/fail rubrics
Missing state artifacts Keep progress files, checkpoints, and regression tests

Prohibition leaves a blank. Replacement behavior gives the model the next foothold.

Prompt Idea from This Bundle

1
2
3
4
5
Do not rewrite my prompt immediately.
First diagnose it across Prompt, Context, and Harness.
For each layer, produce a table with current state, defect/risk, and improvement direction.
Then improve only the top three critical defects.
End with happy path, edge case, adversarial, and regression tests, plus a pass/fail rubric.

This prompt does not ask for something “more impressive.” It asks for something more judgeable. That is why PCH-Optimizer is useful. It treats prompts not as sentences to polish, but as work artifacts whose failures can be observed and tested.

The next note looks at how the diagnostic results become a technique menu and assembly templates.

Comments

댓글

GitHub 계정으로 의견을 남길 수 있습니다. 댓글은 GitHub Discussions에 저장됩니다.