Daily-Driving AI at Enterprise Level

At enterprise scale, the real challenge isn't making AI agents more capable — it's keeping them accurate across a codebase too large for any context window. I built a second-brain structure of issues, facts, spikes, and terms, backed by skills, subagents, and hooks, to reduce the blast radius of agent mistakes and keep the model grounded across sessions and projects.

Gustavo Santos May 23, 2026

Working with software used to mean writing every line yourself. Then 2026 came and with it a breakthrough in how LLMs apply to daily workflows. New models feel more intelligent than ever before, they can handle tasks on their own with high confidence of seeing them through to the end.

Our jobs as engineers are rapidly shifting toward managing agents and making sure the work they take on gets done the right way.

To be honest, I find this very straightforward in fresh projects. Even without any assistance of spec tools, models can handle tasks for bootstrapping projects and handling projects that aren’t that complex.

But at enterprise level, there is so much information that handling knowledge is, from my point of view, the most challenging task. You’re not dealing with one codebase — you’re dealing with a constellation of services, teams, domains, and histories that no single context window can hold. An agent drifting from reality is expensive in two distinct ways: token-expensive and cognitive-expensive.

Token-expensive mistakes are at least legible. The model goes down a wrong path, burns tokens getting there, and you see the bill. Painful, but quantifiable — money solves it.

Cognitive-expensive mistakes are different. They don’t announce themselves. A small, subtly wrong change passes review, ships to production, and sits quietly until something breaks. By then the cost isn’t tokens — it’s incident response, lost trust, and hours of human time tracing back what went wrong. A business won’t last long absorbing those.

This is why I believe the next challenge in software engineering is in controlling the mistakes the agent can make, reducing the blast radius by managing information.

So, how am I managing information?

I ended up creating a “second-brain”-like structure. It’s nothing fancy, it’s the simplest thing I could shape while pairing with AI that I believe fits well with the current frontier models. The structure of information follows some constraints:

  • Information should be small and composable
  • Information should link together
  • Information should be verified
  • Information should be buildable on top of other information
  • Information aging should be natural
  • Information should be reconciled periodically

And these constraints end up in these scopes:

  • Issues: central tracking and execution unit for development work.
  • Facts: empirical, behavioral claims about the codebase.
  • Spikes: research logs generated during the codebase investigation.
  • Terms: normative business domain concepts.

The structure of issues, facts, spikes and terms helps me navigate between multiple projects and keep track of context, limitations, behaviors, business domains and so on. It is the main source of information for Claude Code and Cursor. Yes, I have access to both and I work with both.

This structure is supported by some skills and subagents.

I tend to define skills with the minimal amount of text possible. I see skills as protocols that I want an agent to follow and as it happens with humans, a protocol bloated with text is poorly followed. The same happens with agents.

The heavy work happens under subagents, which use a full context window to search, connect, shape and distill information to me. My knowledge base, for instance, is entirely managed by agents.

I also rely on hooks. Hooks are the deterministic tool in the process of pairing with AI. They are the only deterministic bridge so I use them for two things:

  • Capture what the agent did to “dream on it” after.
  • Inject signals into the agent’s work.

Capturing the agent work is the most boring thing. I just summarize the agent’s work by reading the transcript and storing it as markdown in a directory. Async subagents then “dream” on the data and find two major things: (1) updates to the CLAUDE.md to add directions due to corrections and (2) reconcile the knowledge base, making connections between facts, merging facts, updating facts and deleting facts.

I also use subagents to automatically perform spikes based on the knowledge base data. They work following an adaptation of the scientific method, formulating and validating hypotheses with the objective of gradually expanding the knowledge base.

Injecting signals is the most interesting part. Every file the agent touches is a potential blindspot — the model may have enough context to make an edit, but not enough to make it correctly. By the time the agent opens a file, it should already know the behavioral constraints, domain terms, and known quirks documented about that file in the knowledge base.

The knowledge base is vectorized and indexed by qmd, which is the mechanism that enables agents to search for information. When a hook fires on a file edit, qmd queries the knowledge base for relevant entries and injects them into the model’s context before the edit happens.

This works for facts. If the knowledge base has a fact like “this module assumes idempotent operations — upstream retries depend on it,” that fact gets surfaced when the agent touches the file. The model doesn’t have to infer the invariant from the code — it already knows, and is far less likely to break it.

It also works for vocabulary. If I use a term the model wouldn’t naturally understand, qmd finds the matching TERM-NNN file and injects its definition. The model knows what I mean without me explaining it in the prompt every time.

Having the knowledge base indexed also allows me to discover and connect information across projects. While working on one codebase, I can surface relevant facts from others that share a domain, a service boundary, or a feature.

Every piece of work is tracked by issues. Issues are like “living plans” stored on disk, easily accessible by any agent, and designed to survive multiple sessions. They are the artifacts that allow me to work in parallel across different worktrees on the same task. They hold behavior-driven descriptions of features — readable enough for me and precise enough to keep the model aligned.

Spikes are documents that answer a given question and support a hypothesis. They become the source of facts once backed by observations in the source code. Spikes are ephemeral by nature, but I don’t delete them — they form a solid base for historical analysis and contextual tracing, and are valuable assets when debugging production issues.


The enterprise challenge isn’t making agents more capable — frontier models are already remarkably capable. The challenge is keeping them accurate across a large, interconnected codebase where a single wrong assumption can ripple into a production incident. The structure above is my answer to that: not a way to make agents smarter, but a way to keep them grounded.

This is the workflow I found most suitable for my kind of job and the environment I am in. I hope it gives you some inspiration to find your own. If you want to dig deeper into the specifics of mine, take a look at https://github.com/gustavofsantos/skills.