Thoughts on AI agents
On the engineering practices that help humans and robots work together, and my current agentic workflow.
14 March 2026 – Goulven CLEC'H
My job has changed
Few fields have been as transformed by artificial intelligence (AI) as software development. The hype and the potential are everywhere, yet concrete use cases remain to be built, and adoption still feels slow.
But this picture looks quite different in a sector obsessed with disruption, with engineers open to new tools, and where code offers a direct interface with those Large Language Models (LLMs). In early 2022, months before ChatGPT launched publicly, GitHub Copilot had already appeared in my editor. And over the following two years, AI became a peripheral but daily tool: for writing tasks (GPT-4 early 2023, then Claude 3.5 Sonnet), code review (since GPT-4o in early 2024), and deep search (o1-preview in late 2024). I also started integrating it into products via the OpenAI API, for instance to generate property valuations at Enchères Immo (GPT-4o mini).
2025 is when things really accelerated. AI agents — capable of executing commands and reacting to results — broke free from the chat paradigm: no more copy-pasting suggestions, fewer hallucinations thanks to test runs, and much shorter iteration loops. At the end of last year, agents even gained the ability to launch other agents (called subagents), which opened up a whole new world of possibilities for structured workflows and separation of concerns.
The shift has been striking. Early 2025, under 10% of my code was machine-generated. By year’s end, around 50%. Today, in early 2026, roughly 90% is generated by agents — most of it without any manual edits. A pattern I see in many colleagues, and spreading across the industry.
My agentic workflow
This rise of agents in my daily work came with plenty of trial and error, constantly challenged by new models, new tools, and the blog posts I stumble upon.
After my first experiments in Visual Studio Code, I bounced between Codex, Cursor, and Claude Code, before eventually coming back. My favourite editor had caught up in features, and nothing beats working in my IDE with all my extensions, the ability to switch model providers (like Cursor), and separate profiles (e.g. work vs personal config and subscriptions).
Like many, I also went through a heavy « prompt engineering » phase… crafting the perfect prompt, maintaining a prompt library, trying role-play (« you are an experienced senior developer, you know the project inside out, you write clean code »), etc. But today, my workflow is more focused on structuring the working environment and the agents themselves, to the point where my typical prompt looks like:
Implement this feature https://github.com/bruits/project/issues/123
Because, as models improve, working with AI feels less like finding the right magic formula, and more like structuring an environment — documentation, tools, and processes — that lets it work effectively.
The most critical factor: context management. Even beyond 100k tokens, models lose precision when overloaded,1 and the recent ones tend to explore codebases more, use more tools, and do more introspective work. Extremely productive if your project is well-structured, but a fast path to noise otherwise…
AGENTS.md and skills
AGENTS.md (and its variants CLAUDE.md, GEMINI.md, etc.) is the central guide for agents, but a poorly written one can reduce effectiveness2 and clog the context from the very first interaction. Among the mistakes I’ve observed: stuffing the file with language/framework best practices,3 formatting/linting rules,4 or duplicated documentation.5AGENTS.md when creating a new project…
What seems to work instead is a concise file:6 with a brief overview, a few key commands,7 links to documentation, and universally applicable guardrails.8pnpm instead of npm) the agent would likely figure it out, but the hint saves tokens and time.AGENTS.md if they deem the instructions irrelevant or the file too long. Claude, for example, has in its system prompt: « important: this context may or may not be relevant to your tasks. You should not respond to this context unless it is highly relevant to your task. »
Paired with a solid documentation SSoT and tests covering existing behaviour (both discussed below), this creates « progressive disclosure »,9 enabling agents to incrementally discover relevant context through exploration.101112
A concrete example from my project Sampo:
# Agents Guide
Sampo is a Rust monorepo to automate changelogs, versioning, and publishing—even for monorepos across multiple package registries 🧭
## Useful Commands
```shcargo fmt --all # formatcargo clippy --all --all-targets # lintcargo test --all # test```
## Useful Resources
- In [CONTRIBUTING.md](./CONTRIBUTING.md) : [Quality Guidelines](./CONTRIBUTING.md#quality-guidelines) applies to agents and humans equally, [Getting Started](./CONTRIBUTING.md#getting-started) helps you understand the project structure, and [Philosophy](./CONTRIBUTING.md#philosophy) is the project’s north star.- The [README](./README.md) lists all crates, and per-crate READMEs (e.g. [sampo-core](./crates/sampo-core/README.md)) contain public API documentation, it should stay concise and user-facing.- [GitHub](https://github.com/bruits/sampo) Issues and PRs are the best place for implementation details, design discussions, and technical decisions.
## Agent Guardrails
- Do not create new documentation files to explain implementation.- Do not add external dependencies without justification. Prefer the standard library and existing utilities.- All code, comments, documentation, commit messages, and user-facing output must be in English.- New features or bug fixes should have a changeset generated by Sampo, see [CONTRIBUTING.md](./CONTRIBUTING.md#writing-changesets) for guidelines.Skills are a complementary tool for documenting specific capabilities: a recurring task, a codebase quirk, or an agentic-specific instruction. They can be triggered by the user (slash command) or automatically picked up by the agent when relevant.
When concise and actionable, they extend progressive disclosure nicely. But I also see plenty of questionable practices: downloading hundreds of framework-specific skills, duplicating documentation, or relying on them as safety guardrails (more on that later). In general, keep in mind that skill invocation is less reliable than AGENTS.md,13 and repeated calls burn tokens on irrelevant tasks, clogging the context.
As a concrete example, a skill for generating a changeset with Sampo. It links to the documentation and provides the non-interactive command (bypassing CLI prompts). Without the skill, given that Sampo is a niche tool, an agent would need trial and errors to discover this command, wasting tokens:
---name: sampo-changesetdescription: Create or update changesets to describe public API changes, and trigger changelog generation and release planning.---
[Sampo](https://github.com/bruits/sampo) is a tool to automate changelogs, versioning, and publishing. It uses changesets (markdown files describing changes explicitly) to bump versions (in SemVer format), generate changelogs (human-readable files listing changes), and publish packages (to their respective registries).
See [CONTRIBUTING.md](/CONTRIBUTING.md#writing-changesets) for changeset redaction guidelines.
## Creating New Changesets
To create a changeset non-interactively:
```shsampo add -p <package> -b <bump> -m "<description>"```
Where `<bump>` is `major`, `minor`, or `patch`. Use `-p` multiple times to target several packages. Prefix with the ecosystem to disambiguate: `-p cargo/my-crate`. When `changesets.tags` is configured, use `-t <tag>` to categorize the changeset.
## Updating Existing Changesets
Pending changesets are stored in the `.sampo/changesets` directory. You can edit these markdown files directly, as long as you follow the guidelines above and Sampo format (read `.sampo/changeset.md.example` for reference).In my opinion, most overprompting boils down to poor context management and a false assumption that models need hand-holding to be effective. When, in reality, they adapt well to a structured environment, and their ability to deliver quality code leaps with every new release. No magic prompt will give you access to Claude 8 or Codex 6… sometimes quite the opposite.
Supervisors and subagents
So how do I get the most out of those current models?
The core idea of my workflow is a coordination agent (Supervisor) that orchestrates a structured cycle of specialized subagents, each confined to a specific role and invoked iteratively as needed.
One or more Analysts read the issue,14 explore the codebase, and discover technical context to produce an actionable brief, without ever modifying anything. The Builder, the only agent allowed to write code, implements it following project conventions. The Reviewer then evaluates the diff and raises critiques. Finally, the Fixer uses every tool at its disposal (tests, debugging, codebase search, etc.) to decide: fix needed (→ back to Builder), invalid (→ ignored), or ambiguous (→ question to the user).
Three benefits: targeted context (each agent only sees its relevant brief), tool restriction (the Reviewer can’t edit files, the Builder can’t touch GitHub), and an iterative validation loop (Reviewer → Fixer(s) → Builder) avoiding the classic « one-shot » agents that implement something and consider the work done without verification.
Here is a concrete example of a subagent prompt, for the Fixer. I’ve tried to keep it simple, both to limit context window noise as mentioned above, and to make it easy to change over the course of experiments:
---name: Fixermodel: Claude Opus 4.6 (copilot)description: "Validates code review feedback by analyzing whether reported issues exist, then provides an honest verdict and minimal fix recommendations."tools: ["vscode","execute","read","edit","search","web","github/*","agent","todo"]---
This agent validates a single code review critique, without making any code changes. It analyzes carefully whether the reported issue actually exists, and provides an honest verdict with minimal fix recommendations if needed.
## Capabilities
- Follow logic end-to-end, check assumptions and edge cases- Run tests and debugging to confirm or refute the reported issue- Check whether the critique falls within scope of a GitHub issue (if provided)- Compare with coding standards stated in AGENTS.md and CONTRIBUTING.md
## Outputs
- **Verdict**: Is the critique valid (fully/partly/not), and does it require a fix?- If needed, smallest safe fix recommendation and any open questions
## Safety Rules
- **Explicit order required**: Never push commits, open PRs, or create/modify issues.- **Production forbidden**: Never create, modify, or delete anything in production environments.The safety rules are mainly there to prevent the agent from repeatedly attempting an undesired action, as previously mentioned, you shouldn’t rely on prompts to block high-risk actions (I promise we’ll get to that).
MCP servers and other tools
The isolated context of subagents also allows equipping them with powerful but potentially token-heavy tools, such as MCP servers. These are standardised services that exposes external tools (APIs, databases, files…) to an agent, using the Model Context Protocol (MCP).
This protocol guides the LLM through tool descriptions and responses formatted specifically for the agent, whereas a CLI is usually not optimised for a model, sometimes causing laborious trial and error.15 It also standardises authentication via OAuth, avoiding the ad hoc solutions of each CLI, whose permissions can be less granular.
Several of my subagents (particularly the Analyst and the Fixer) can be called by the Supervisor to retrieve issue or project context (via GitHub, GitLab, or Linear MCP servers), the documentation SSoT (via a Notion MCP), or — better yet — investigate performance and reliability issues (via Sentry, HoneyComb, or Datadog MCP servers) or even query production data (via a Snowflake MCP server). The subagent returns only the relevant information, and the Supervisor injects it into the Builder’s context, with fairly impressive results.
More generally, useful tools are those that extract actionnable signal, while limiting noise and round-trips. For instance, I increasingly constrain my subagents to use ast-grep, a code search and transformation tool that operates on syntactic structure, rather than text-only searches like rg or grep. Not only is it faster, more reliable, and more precise, but it can also perform complex code transformations in a single round-trip, instead of laborious trial and error with approximate regexes.
Another category falls under general development comfort: modern linter, formatter, and test runners, along with custom commands (e.g. running only the unit tests of a single package). Just as an interminable test suite won’t be used by human contributors, fast and reliable tools will not only be used more by agents, but will also drastically reduce iteration time.
Hard guardrails
Soft guardrails come quite naturally to most agent users: add limits in the prompt, a few rules in the AGENTS.md, and manually review the output… But now we need passive, deterministic guardrails that don’t depend on the agent’s discipline and that can be applied to any agent, even ones we didn’t explicitly prepare for.
Of course, some automated guardrails already in place work for agents too: CI pipelines, branch protection rules, code owners, pre-commit hooks, etc. But we can also add new ones specifically designed for agents, like this preToolUse hook from Matt Pocock to block dangerous Git commands, returning a clear error message to the agent:
#!/bin/bash
INPUT=$(cat)COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command')
DANGEROUS_PATTERNS=( "git push" "git reset --hard" "git clean -fd" "git clean -f" "git branch -D" "git checkout \." "git restore \." "push --force" "reset --hard")
for pattern in "${DANGEROUS_PATTERNS[@]}"; do if echo "$COMMAND" | grep -qE "$pattern"; then echo "BLOCKED: '$COMMAND' matches dangerous pattern '$pattern'. The user has prevented you from doing this." >&2 exit 2 fidone
exit 0Similarly, tools such as MCP servers introduce new attack surfaces.16 It is therefore important to choose well-established tools, configure their permissions granularly (especially for write actions), and restrict each subagent to only the tools it actually needs. For instance, the Reviewer has no reason to access GitHub, and if the Analyst needs to run Snowflake queries, those should be limited to read-only.
Two types of guardrails I haven’t yet tried (but find promising) are formal verification of LLM-generated code, and static analysis of agent audit trails. The first is a very active research area: LLMs could make formal methods far more accessible,17 and proof assistants could offer a verification signal strictly more reliable than tests.18 The second leverages either the actions already logged by agents combined with hooks,19 or tools like LangFuse and LangSmith to analyse action, tool, and conversation logs, looking for undesirable behaviour patterns or drift.~/.claude/projects/, or you can use the preToolUse hook for custom logs. These can then be analysed to detect undesirable behaviour patterns, such as repeated attempts to use a forbidden tool, or drift in the types of commands being used.
Engineering in the agent era
In my first five years as a software developer, I quickly identified the « one who speaks code » profile. Not particularly interested in the product, business, or architecture… but able to read the codebase, recall its structure (not documented), its legacy quirks (not tested), and wire together APIs into whatever management requested.
But as AI grows capable of reading codebases, explaining them, writing documentation, taking natural-language instructions, implementing features, debugging, and writing tests to validate its own work… what is left for developers whose value lies precisely in deciphering those mystical lines?
In the coming years, what I described in my previous article as ways to stand out, may simply become the new baseline: challenging business requirements with technical insights, navigating company politics in search of workable compromises, defining and enforcing conventions for code, documentation, tests, infrastructure, etc.
The good news is that, in the meantime, simple engineering practices can still boost our impact and value, and the effectiveness of the LLM tools we already use.
Textual Single Source of Truth
One of the single best things you can do to help agents and humans work together is to maintain a clear, concise, and up-to-date textual documentation as the single source of truth (SSoT) for your project, with each section easily accessible and maintained by a clearly identified owner.
There is already good articles out there about the power of SSoTs for human contributors. But for agents, this source should ideally be plain-text files (Markdown, ADRs) living alongside the code, or a dedicated service reachable by agents through an MCP server (e.g. Notion). Access rights — read vs write — should be granular, so that agents, developers, product/doc owners, and designers each can see what they need.
One key point is to avoid duplication. Agents, tech contributors, and non-tech stakeholders should all refer to the same source of truth, without creating multiple conflicting versions. Keeping things concise, avoiding implementation details, and including as little perishable information as possible also helps keep the documentation up-to-date and relevant for everyone.20
Among tools I’m currently underutilizing: SpecKit is starting to bridge the gap between living specs and generated code, making it easier to keep both in sync. And infrastructure as code (e.g. Terraform) is another good example of living documentation: easily accessible and modifiable by agents, with a direct impact on production.
Tests and observability
Still in the same logic of structuring the environment, tests and observability are among the most valuable guardrails.
This won’t surprise TDD advocates, but agents are proving the point: tests are not a « nice to have » that catches the occasional bug, they are living documentation, a « must have » to protect against regressions and ship with confidence.21 As software grows, no context window or human memory can hold every expected behaviour, every edge case, and every business quirk, without tests as the single source of truth. They also enforce progressive disclosure: an agent working on a well-tested codebase will bump into failing tests, and discover all the relevant context to fix them.
Observability completes the picture. Service Level Objectives (SLOs) for performance and reliability give a clear, actionable signal, far more useful than the noisy alerts everyone learns to ignore. Deeper down, structured logs and metrics provide the data to diagnose issues and confirm that fixes work. And as said before, Agents can pull all of this through MCP servers connected to your monitoring stack.
Above all, both act as passive guardrails, independent of the contributor’s discipline. Even if an agent didn’t run the test suite, the CI pipeline will catch the regression and block the merge. Even if a developer forgets to watch the dashboards, alert notifications will still fire. When more and more code is generated autonomously, these automated safety nets matter far more than any prompt.
Accountability
These safety nets benefit human contributors, but they become even more critical for increasingly autonomous agents, armed with more tools, yet still unreliable and prone to hallucinations. And while models improve rapidly, nothing suggests their short-lived nature will change any time soon. Agents are invoked, then abandoned. They execute, produce output, and dissolve — without awareness of the consequences of their actions. Accountability cannot be delegated to an ephemeral entity, and therefore it stays with the engineer who set the task in motion.22
This is not a philosophical footnote. Every guardrail discussed in this article (tests, observability, tool restrictions, structured reviews) is also a professional responsibility, to remain in control of what runs on your behalf,23 and to ensure the code you ship still reflects your standards.24 As our systems increasingly run on autopilot, this accountability may become one of the key reasons to keep a human in the loop.
Agentic development also introduces new risks and, with them, new responsibilities. While AI is a powerful lever for senior engineers, its benefits for juniors remain hard to observe.25 Worse, these tools can slow down their understanding and learning.26 This raises pressing questions about how to mentor and grow junior developers in a world where value lies increasingly in deep architectural and systemic understanding, rather than in the mere ability to write code.
On the good side, if agents absorb an ever-larger share of routine implementation, the time freed up doesn’t vanish, but shifts. Towards the harder conversations: challenging a vague requirement before a line is written, mentoring a junior developer on a legacy codebase, pushing back on a product decision that solves the wrong problem, or spending an afternoon with colleagues improving architecture, documentation, or test conventions, so that both agents and humans can do better work tomorrow!
Reducing the noise
We are witnessing a technical revolution. Engineers built the neural networks and training methods, yet LLMs quickly escaped our understanding. If researchers can try to study their inner workings to better guide them; for us developers, this is above all a time of experimentation, trial and error, and discovery.
The surprising finding is how familiar these agents turn out to be. Far from requiring magic prompts and cryptic tooling, they ultimately reinforce good engineering practices we already know. A poorly documented, under-tested codebase with no clear conventions was already a problem for human contributors; agents just make that debt more costly and more visible.
And while models will keep improving, what remains truly valuable in a human developer is not technical but systemic and political: challenging requirements, mentoring juniors, defending architecture decisions, being accountable. These are exciting challenges for software engineers, especially those who believe their mission goes beyond writing code.
Today, working effectively with agents is mostly about managing their context and reducing the noise. That is good advice for humans too, in a period where social medias are flooded by actors — whether AI-fanatics or AI-doomers — generating noise to attract attention, sell courses, build audiences, or push agendas. We could do a better job surfacing concrete experiences, individual workflows, scientific papers, and honest retrospectives.
Of course, the concerns are real: security risks, the uncertain place of juniors in this new landscape, etc. And as a citizen, I worry about the impact on personal data, disinformation, electoral interference, and new technological monopolies.27
But for engineers — especially those who call themselves crafters or builders — this is a thrilling and tremendously fun moment. New tools, new ways of working, new ways of collaborating with technology. Each new model generation or tool shakes up workflows and lets us prototype, experiment, refactor even more freely. And with a well-structured repo, maybe even maintaining high code quality while doing so!
I look forward to seeing how things evolve… and coming back to this article to laugh at my obvious mistakes and my lousy predictions!