If we say a prompt got better, we need evidence. It is not better merely because it became longer. It is not better merely because it sounds more engineered. The most practical part of the PCH-Optimizer catalog is this move: it turns improvement into diagnostic tables and verification rubrics.
This note covers 42 of the 115 catalog items.
| Bundle | Items | Role |
|---|---|---|
| Prompt Layer Diagnostic | 8 | Find defects in the instruction string |
| Context Layer Diagnostic | 8 | Find defects in the information shown to the model |
| Harness Layer Diagnostic | 8 | Find defects in tools, loops, state, and evaluation |
| Verification & Evaluation | 10 | Judge whether the upgrade worked |
| Anti-Pattern Guardrail | 8 | Block common ways prompts decay |
| Total | 42 | The diagnostic and judgment layer |
Diagnosis Is Not Listing
The PCH-Optimizer diagnostic table is not a simple checklist. Each row asks for current state, defect or risk, and improvement direction.
1 | current state -> defect/risk -> improvement direction |
Current state alone is observation. Defect alone is criticism. Improvement direction alone is a prescription without enough evidence. The three columns together create diagnosis.
“No output format” is not yet a full diagnosis. A stronger row looks like this.
| Current state | Defect/risk | Improvement direction |
|---|---|---|
| The answer is free-form prose | It is hard to post-process, compare, or detect omissions | Use a table or JSON schema with required fields |
Now the upgrade path is visible.
Prompt Diagnostics: The Sentence Failure
The eight Prompt Layer Diagnostic items examine objective, role, instruction clarity, output contract, examples, reasoning scaffold, failure handling, and structure. This is the most familiar part of prompt engineering.
Even here, the point is not to make the prompt prettier. The question is whether the objective is clear, the role is at the right altitude, the instruction is positive rather than mostly prohibitive, the output contract exists, and the examples are representative.
Useful questions include:
- Can the prompt’s objective be stated in one sentence?
- Is the role only as specific as needed?
- Are desired actions clearer than prohibitions?
- Can a person or program verify the output?
- Are examples canonical rather than exhaustive?
Prompt diagnosis looks like sentence editing, but it is really about making the output contract visible.
Context Diagnostics: What the Model Sees
The Context Layer Diagnostic items are less visible and often more important. They examine required information, ordering, retrieval strategy, grounding and citation, memory, token budget, context-rot risk, and redundancy to cut.
A strong prompt with weak context produces average answers. A fairly plain prompt with well-selected context can perform surprisingly well.
The blog idea here is not “how to give the model more context.” It is “how to remove context well.”
| Question | Reason |
|---|---|
| What behavior breaks if this information is removed? | To keep only high-signal tokens |
| Is the same fact repeated? | To reclaim token budget |
| Is key information buried in the middle? | To reduce recall loss in long context |
| What should be retrieved just in time instead of preloaded? | To separate JIT retrieval from stuffing |
Context engineering is not generosity. It is editing.
Harness Diagnostics: Runtime Failure
Harness Layer Diagnostic catches failures that cannot be seen by reading the prompt sentence alone: tool set, agent loop, state artifacts, self-verification, context management, init/executor split, failure-mode mitigation, and evals.
If an agent keeps declaring work complete too early, adding “finish the job” to the prompt is not enough. We need completion criteria, checkpoints, test commands, progress files, and retry policy. That is a harness problem.
Harness diagnosis asks:
- Are the available tools minimal and unambiguous?
- Does the loop follow gather, plan, act, verify?
- Is state stored outside the chat as a file or structured artifact?
- Is completion decided by verification rather than model confidence?
- Are failures recorded so the same mistake is not repeated?
At this layer, prompt engineering starts to look like software operations.
Verification Splits the Vibe Apart
The ten Verification & Evaluation items split “good answer” into judgeable pieces. They include happy path, edge case, adversarial, and regression tests, plus rubrics for schema adherence, instruction fidelity, hallucination control, and termination accuracy.
The move is to turn each standard into a measurement method and passing line.
| Vague criterion | Operational criterion |
|---|---|
| It is relevant | It reflects all 5 of 5 required inputs |
| It is consistent | All output fields match the schema |
| It is well grounded | Every factual claim has a source or is marked unsupported |
| It completed the work | Test command passed and open items are zero |
This moves prompt improvement out of taste debate.
Anti-Patterns Are Direction Changes, Not Just Bans
The eight Anti-Pattern Guardrail items cover edge-case stuffing, over-prescriptive rules, ambiguous tool additions, bloated context, invented personas, duplicated sections, vibe evaluation, and missing state artifacts.
The important move is not just “do not do this.” Each anti-pattern needs a replacement behavior.
| Anti-pattern | Replacement behavior |
|---|---|
| Exhaustive edge-case stuffing | Curate canonical examples |
| Ambiguous tool additions | Minimize tools and state when to use each |
| Vibe evaluation | Use pass/fail rubrics |
| Missing state artifacts | Keep progress files, checkpoints, and regression tests |
Prohibition leaves a blank. Replacement behavior gives the model the next foothold.
Prompt Idea from This Bundle
1 | Do not rewrite my prompt immediately. |
This prompt does not ask for something “more impressive.” It asks for something more judgeable. That is why PCH-Optimizer is useful. It treats prompts not as sentences to polish, but as work artifacts whose failures can be observed and tested.
The next note looks at how the diagnostic results become a technique menu and assembly templates.
댓글
GitHub 계정으로 의견을 남길 수 있습니다. 댓글은 GitHub Discussions에 저장됩니다.