Back to all posts

Wrapping a Harness Around Your AI Coding Agent

How I use CLAUDE.md, AGENTS.md, skills, hooks, and checks to make AI coding agents follow repeatable engineering rules.

I use AI to write code, and I also happen to be blind. That second fact changes how I work with the first one. Most developers can glance at a block of generated code and feel like they have checked it. I cannot. Every line goes through a screen reader, one at a time, so I read what the model actually wrote instead of trusting the shape of it.

That constraint taught me that prompting is not enough. I need the agent to follow the same rules every time and show evidence that its work is correct. I call the files and checks that make this happen the harness.

This post explains how I build that harness for the two agents I use most, Claude Code and OpenAI Codex.

What a harness actually is

When I started leaning on AI to write code, I noticed how much it assumed. It would look at the file in front of it and quietly decide the rest of the repository worked the same way, instead of tracing where the data came from or who depended on it. So I did what anyone does at first: I repeated myself. I told it over and over, do not assume, confirm the facts before you change anything.

Repeating yourself does not scale, and it does not survive a fresh session. The fix is to stop typing the rules and start writing them into files the agent loads automatically, every single time. That is the harness. It has two halves:

  1. Instructions the agent reads at the start of every session, so your standards are the first thing it sees, before your prompt.
  2. Enforcement that runs on its own, so the standards are checked by your tooling rather than by your patience.

Both agents provide places for durable instructions, reusable workflows, configuration, and enforcement. They use different filenames and discovery rules.

The Claude Code side: CLAUDE.md and the .claude directory

Claude Code reads CLAUDE.md files as project instructions. Put one at the root of your project and commit it to git so each session starts with the same repository rules. This is the heart of the harness.

Here is a trimmed version of my own CLAUDE.md:

# Project: codingblindtech.com

## Golden rules
- Never assume. If a fact is not in the code or the docs, ask or verify. Do not invent values or numbers.
- Confirm before you change. State what you are about to do and why, then do it.
- Trace data to its source. Before editing a function, find where its inputs come from and who calls it.
- Tests are not optional. Write the test first, then the code that satisfies it.

## Commands
- Install: `npm install`
- Test: `npm run test`
- Build: `npm run build`

## Boundaries
- Do not change the Astro `site` value or the Firebase or DNS config without asking.

A few things worth knowing about how this file loads, because the behavior is more useful than it first appears.

Claude Code does not read one file. It walks up the directory tree from wherever you launched it, collecting every CLAUDE.md it finds along the way and concatenating them. The collection is additive, not a precedence battle, so a rule in a parent folder and a rule in a subfolder both apply. Files nested below your current directory load lazily, only when Claude touches a file in that subtree. That means you can put repository wide rules at the root and package specific rules deeper in, and each layer shows up exactly where it is relevant.

For personal instructions that apply across projects, use ~/.claude/CLAUDE.md. For private project-specific instructions, Claude Code supports importing another file from the shared CLAUDE.md. This is the current replacement for relying on a local instructions file that only one developer has.

The .claude directory

CLAUDE.md is the instructions half. The enforcement half, plus everything more structured than a single file, lives in a .claude/ directory next to it. The pieces I lean on:

  • CLAUDE.md at the project root: loaded every session and committed to git.
  • .claude/settings.json: permissions, hooks, environment variables, and model defaults.
  • .claude/rules/: topic-scoped instructions, optionally gated to certain file paths.
  • .claude/skills/: reusable workflows you invoke by name.
  • .claude/agents/: subagents, each with its own prompt and context.
  • ~/.claude/: personal config that applies across all your projects.

The rules/ directory is for instructions that should only apply in certain places. A rules file can carry a paths: frontmatter block so it loads only when Claude is working on files that match. That keeps your TypeScript conventions out of the way when Claude is editing a config file, and vice versa:

---
paths:
  - "src/**/*.ts"
---

# Verification rules for TypeScript
- Every new function ships with a Vitest test in the same change.
- No `any`. If a type is unknown, model it explicitly rather than papering over it.

The most important file for the enforcement half is .claude/settings.json, because that is where hooks live. A hook is a command that Claude Code runs automatically at a point in its lifecycle. This is the part of the harness that does not depend on the model choosing to behave. My favorite is a hook that runs the test suite after Claude edits or writes any file:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          { "type": "command", "command": "npm run test --silent" }
        ]
      }
    ]
  }
}

When that hook runs and the tests fail, the failure is fed straight back to Claude as part of the tool result, so it sees the broken test and fixes it before moving on. For someone who works test first, this is the harness made real: the rule in CLAUDE.md says write the test first, and the hook makes the green suite a condition of progress rather than a suggestion. If you want a hard stop instead of feedback, a PreToolUse hook can deny a tool call outright, which is how you block something like a destructive shell command before it ever executes.

The same directory holds agents/, where you define subagents: focused helpers with their own prompt and their own slice of context. A reviewer subagent whose only job is to hunt for correctness, security, and test risk is a natural fit for the never assume rule, because it gives you a second independent pass over the work.

One security rule applies to every coding agent: do not let it read a secret unless the task requires that secret. Anything the agent reads can enter its context, command output, or logs. Deny access to credential files and keep secrets out of prompts.

The Codex side: AGENTS.md, .agents, and .codex

OpenAI Codex uses the same mental model with different names. The instructions file is AGENTS.md, a plain Markdown file that Codex concatenates into context at the start of every session. Think of it as a README written for the agent rather than for a human: a README explains what a project does, while AGENTS.md explains how the project should be worked on, what to run, and what to never touch.

# AGENTS.md

## Project
Static Astro blog deployed to Firebase Hosting on the free tier.

## Commands
- Install: `npm install`
- Test: `npm run test`
- Build: `npm run build`

## Conventions
- Never assume. Verify facts against the code, or ask, before proceeding.
- Trace inputs to their source before editing a function.
- Write the test first.

## Boundaries
- Do not modify deployment or DNS configuration without confirmation.

AGENTS.md is also used by tools beyond Codex. That makes it a useful place for repository rules that should travel with the code.

Codex loads global instructions from ~/.codex/AGENTS.md, then walks from the project root down to the current working directory. In each directory, AGENTS.override.md takes precedence over AGENTS.md. Instructions closer to the current work appear later and override broader guidance.

The .agents and .codex directories have different jobs

The current Codex layout uses both top-level directories:

  • AGENTS.md: durable repository instructions.
  • .agents/skills/: repository skills with reusable instructions, references, or scripts.
  • .codex/config.toml: project-scoped Codex configuration.
  • .codex/agents/: project-scoped custom subagents, one TOML file per agent.
  • .codex/hooks.json or hooks in .codex/config.toml: project lifecycle hooks.
  • ~/.codex/: personal Codex configuration, instructions, hooks, and custom agents.

The .agents/skills directory is where a repository can store a workflow such as a writing voice, release process, or review checklist. Each skill has a SKILL.md file, and Codex loads the full instructions only when the skill is used.

The .codex/config.toml file controls project settings. Two useful instruction settings are project_doc_max_bytes, which limits the combined instruction size, and project_doc_fallback_filenames, which lets Codex recognize another filename as project instructions.

# .codex/config.toml
project_doc_max_bytes = 65536
project_doc_fallback_filenames = ["TEAM_GUIDE.md"]

[agents]
max_threads = 4

To define a project reviewer subagent, create .codex/agents/reviewer.toml:

name = "reviewer"
description = "Review changes for correctness, security, and missing tests."
developer_instructions = """
Lead with concrete findings.
Verify claims against the code and tests.
"""

Codex loads project-scoped .codex/ configuration only when the project is trusted. In an untrusted project, it skips project config, hooks, and rules while keeping user and system configuration separate.

Where the two converge

Strip away the filenames and the two harnesses are the same shape, which is the real point.

Both put durable project instructions in Markdown files: CLAUDE.md and AGENTS.md. Both support broader rules at the repository root and more specific rules closer to the work. Both separate committed team configuration from personal configuration in your home directory. Both also provide skills, hooks, permissions, and subagents for work that needs more structure than one instructions file.

The principle underneath

A harness is not a clever prompt. It is the decision to encode your standards as files and checks instead of retyping them and hoping. Instructions set the expectation: never assume, confirm the facts, trace the data to its source, write the test first. Enforcement makes the expectation real: a hook that runs your tests after every edit, a subagent that reviews for risk, a permission rule that keeps the agent away from your secrets.

I built this because I cannot glance at generated code and move on. I need assumptions, errors, and test failures to surface without depending on a visual review. That same harness helps any developer who wants evidence instead of confidence from the model. It is now the first thing I set up before I ask an agent to change a project.

Sources and further reading