How AI and Agentic Development Are Reshaping Software Engineering

For most of the last two decades, the scarce skill in engineering was producing the artifact: writing the code, wiring the services, getting the thing to run. Implementation was the bottleneck, and the entire profession was organized around it: how you were hired, how you were evaluated, how you grew.

AI has moved that bottleneck. When a model can draft a working service, a migration plan, or a system diagram in seconds, the act of producing a plausible artifact stops being a differentiator, because everyone can now produce one. The constraint is no longer “can you build it?” The constraint is “do you know what to build, and why that instead of the alternatives?”

Agentic development pushes this further. An agent does not just emit a snippet; it executes a multi-step plan, calls tools, and produces working output across an entire workflow. The human is no longer the one doing the steps. The human is the one:

framing the problem correctly,
setting the constraints the agent must operate within,
reviewing the output critically,
and owning the decision when something goes wrong.

This changes what seniority means:

A junior engineer used to be someone who needed guidance on implementation. In an agentic world, a junior engineer is someone who accepts the first output the model produces.
A senior engineer is someone who knows when to reject it, what is missing from it, and how to prompt the system toward a better answer.
A principal engineer is someone who can look at three AI-generated options and explain, with precision, which one is right for this context, why the others fall short, and what would have to be true for each of them to become the right answer.

But rejecting and comparing are not about spotting failure modes the model cannot see. A capable agent will often name them for you: the unbounded queue, the cache quietly serving stale data, the migration with no rollback, the synchronous call that cascades.

What the agent cannot do reliably is tell you which of these actually matters here, weigh it against constraints it does not know, and own the consequences if it is wrong. Its recognition is real but inconsistent and context-blind. Your job is to be the layer that makes it reliable:

deciding which risk dominates given the real SLAs, team, and budget,
catching the confident mitigation that is subtly wrong,
and owning the outcome regardless of what produced the artifact.

There is a second half to the role, and it is just as important and far more scalable. An agent is only as good as the environment it runs in. Left to a bare prompt, it improvises differently every session, drifts from team conventions, and re-derives decisions that were settled months ago. The senior is the person who removes that variance ahead of time by building the rails the agents and the team run on:

the harnesses that catch mistakes automatically: tests, CI, type systems, linters, and evals, so a wrong answer fails loudly instead of shipping silently,
the captured tribal knowledge: architecture decision records (ADRs), conventions, runbooks, and context files that encode how things actually work here,
and the shared context that travels, so an agent behaves consistently across sessions, every engineer on the team gets reproducible results, and the knowledge compounds across teams instead of living in one person’s head.

This is where the leverage is. Reviewing one agent’s output is linear effort. Building the harness and the context that make every agent and every teammate produce reliable output is the modern form of what senior engineers have always done: turning hard-won, tacit knowledge into something the whole organization can depend on, reproducibly.

The scarce skill, then, has shifted from execution to judgment:

choosing which option among several, not generating one option,
knowing what to reject and why,
naming the trade-off the AI glossed over,
and defending the decision when someone pushes back.

This is not a temporary adjustment. It is a structural shift in what software engineering is. The engineers who thrive in this era are not the ones who resist AI or merely tolerate it, but the ones who develop a new layer of craft on top of it:

the ability to reason clearly about systems,
communicate decisions precisely,
and take accountability for outcomes that an agent helped produce but a human must own.

That craft is not innate. It is a method. And like any method, it can be learned, practiced, and made repeatable. The sections below break it down into concrete habits:

a template for structuring a decision,
a way of using AI that adds judgment rather than just speed,
a toolkit of patterns to keep sharp,
a discipline for defending every decision,
and a habit of capturing decisions so they compound.

A single running example — Acme.ai — threads through the whole section, so each principle shows up both in the abstract and applied in practice.

How hiring is adapting

As the scarce skill moves from implementation to judgment, hiring is moving with it. Live coding puzzles and algorithm trivia are giving way to scenario-based assessments where AI assistance is not just permitted but assumed. The question being asked has changed. It is no longer “can you produce a solution,” because a model can. It is “can you produce one you can stand behind.”

The artifact you submit is not the deliverable; it is the opening move in a conversation. Someone will probe the reasoning, challenge the choices, and raise alternatives you did not consider. The people who do well are the ones who wrote with that in mind, the same way they would for a real design review: assumptions made explicit, rejected options named, a trail of judgment rather than a display of output.

The method

Every part of this method is easier to grasp in motion than in the abstract. Here is the running example the rest of this section works through.

Acme.ai is a multi-tenant B2B analytics SaaS. Customers build dashboards and export reports. Today, when a user clicks “Export report,” the web server runs the query, renders a PDF or CSV synchronously, and returns it in the HTTP response. This worked at launch.

Now the largest tenants have datasets 50x bigger: exports take 30 to 90 seconds, hit the 60-second gateway timeout, and saturate web workers during business hours, degrading the whole app for every tenant.

Product wants two new capabilities: scheduled reports (“every Monday 8am, email me the weekly summary”) and large exports (up to roughly 1M rows) that currently fail outright. Evolve the reporting feature.

The method is five habits. Only one of them is something you produce — the structured decision at the center; the other four are disciplines that surround it.

Before you startKeep the fundamentals in your headKnow the core patterns cold, so the right one surfaces the moment a problem calls for it.

ThroughoutUse AI as a thinking partner, not an answer machineLet the model generate options and name risks; you prune, reject, and sharpen what remains.

The artifact you produceStructure the decision with a repeatable templateThe engineer's work — the one thing you actually create. The other four habits exist to make this one defensible.

A pass across every partDefend every decision before anyone asksPressure-test every choice against the obvious alternative before a reviewer does.

After — so it compoundsCapture the decision so it compoundsWrite it into ADRs, fitness functions, and context files so the next person and agent inherit it.

Keeping the fundamentals in your head is a precondition: they have to already be there when the problem lands, because you recall them in the moment rather than learn them then. Using AI runs through the whole process. Defending every decision is a pass over the entire template, applied to every choice in it — constraints, options, recommendation, risks — not a single step performed after the recommendation. And capturing the decision happens once the dust settles.

Structure the decision with a repeatable template: UP/DowN

Give the template a name, because a named structure is one you actually reach for under pressure. I call it UP/DowN — Understand, Plan, Decisions, Notes — and the casing marks the two halves. UP is the groundwork: the situation you are in, and the plan to change it. DowN is the judgment and its residue: which option you chose and why, and what the next person has to watch because of it.

U — Understand: restate the problem, the constraints, and the assumptions you are making, and name the blast radius — what this touches and what must not break,
P — Plan: the migration or rollout path that ships your chosen approach without downtime,
D — Decisions: the two to three candidate approaches (including one deliberately boring one), the trade-off matrix, and the recommendation — the choice, the alternatives it beat, and why they lost here,
N — Notes: the risks, what you would measure or validate first, and the side-effects and caveats that fit nowhere else.

It is the same four sections you would put in a written design doc. The running example below fills in all four. For a design decision like this one, the Plan to roll it out naturally comes last — you cannot plan a rollout until you have chosen what you are rolling out.

The structure lets you spend your attention on thinking instead of on structure, and it makes your reasoning legible to whoever reads it next: a teammate, a reviewer, or you in six months. UP holds up the DowN — without the groundwork, the decision you write down is just an opinion with formatting. Keep each movement tight; signal density beats length.

Applied to Acme.ai:

Each part below pairs the prompt that drives it with what comes back. The prompts are turns in one session: you hand the agent the brief and the codebase in the first one, and it carries that context through the rest.

U — Problem, constraints, and assumptions.

You delegate the exploration; you set the boundaries it must respect.

Here is the system and the change: [paste the Acme.ai brief above]. Read the codebase,
then map the blast radius for the export change: which components, tables, and endpoints
it touches, what breaks, what must not break, and the hard constraints I cannot violate —
the 60-second gateway timeout, the live synchronous endpoint with real consumers, and
per-tenant isolation.

What you commit to the section — the agent’s map, plus the constraints only you can state:

The core problem:

Synchronous, in-request generation does not scale.
Heavy jobs occupy shared web workers, causing noisy-neighbor degradation across all tenants.

Stated assumptions:

single AWS region,
Aurora Postgres as the primary store,
the existing synchronous endpoint has live API consumers that cannot be broken,
small team, so operational simplicity matters,
cost-sensitive.

Non-functional targets:

exports of up to 1M rows complete within 5 minutes,
app latency is unaffected,
no duplicate scheduled sends,
artifacts retrievable for 7 days.

D — Candidate approaches.

You delegate the option space, but reserve the choice.

Generate three approaches: one deliberately boring and minimal, two meaningfully
different. For each, give me: how it works in a few sentences; a Mermaid diagram of the
request and data flow where it clarifies the design; the schema and API changes; the
migration and rollback story; the trade-offs it carries; and its known limitations.
Do not recommend one — I will choose.

What it generates — three options to choose among, not a recommendation:

Option A (boring): keep generation synchronous, but raise the timeout, move exports to a dedicated worker pool behind a separate endpoint, and add a read replica so heavy queries stop hitting the primary. Minimal change.
Option B (async job queue): “Export” enqueues a job; workers generate the artifact, stream it to S3, and notify the user via email and in-app link. Scheduled reports reuse the same job, fired by a scheduler. One mechanism serves both new features.
Option C (managed serverless): offload generation to Step Functions and Lambda, artifacts to S3, EventBridge for scheduling. Least infrastructure to run, but new operational surface and runtime/size ceilings for very large jobs.

The core of the recommendation is a change in the shape of the request path. The current design does everything inline; Option B moves generation off it entirely.

Current (synchronous):

sequenceDiagram
    actor User
    participant API as Web server
    participant DB as Postgres
    User->>API: Click "Export report"
    API->>DB: Run query (30-90s)
    DB-->>API: Rows
    API->>API: Render PDF / CSV
    API-->>User: File in HTTP response
    Note over User,API: Request blocks for the whole generation — web workers saturate and the app degrades

Proposed (Option B, asynchronous):

sequenceDiagram
    actor User
    participant API as Web server
    participant Q as Job queue
    participant W as Worker pool
    participant DB as Read replica
    participant S3 as S3
    participant N as Notifier
    participant Sch as Scheduler
    User->>API: Click "Export report"
    API->>Q: Enqueue job
    API-->>User: 202 Accepted (immediate)
    Sch->>Q: Enqueue scheduled job (cron)
    W->>Q: Pull job
    W->>DB: Run query
    W->>S3: Stream artifact in chunks
    W->>N: Job done
    N-->>User: Email + in-app link
    Note over User,API: Request returns immediately — generation runs off the request path

D — Trade-off matrix.

You name the dimensions that matter here; it fills the grid.

Lay the three approaches against the dimensions that matter here — 1M-row capacity, app
latency, scheduled-report support, build effort, ops cost, reversibility — as a table.

What it produces:

Dimension	A: beefed-up sync	B: async queue	C: serverless
Handles 1M-row export	Weak	Strong	Weak (runtime limits)
Protects app latency	Partial	Strong	Strong
Enables scheduled reports	No	Yes	Yes
Build effort	Low	Medium	Medium-high
Ongoing ops cost/complexity	Low	Medium	Low-medium, new surface
Reversibility	High	Medium	Medium

D — Recommendation and reasoning.

You make the call — then have the agent attack it.

I am leaning toward Option B because it removes generation from the request path and one
job abstraction serves both features. Make the strongest case against B, and the best
case for the option I am rejecting. What am I missing?

Your call, after weighing the pushback:

Option B is the core, folding in Option A’s read replica as a supporting move:
- B removes generation from the request path, the actual root cause of the noisy-neighbor problem.
- The same job abstraction serves both scheduled reports and large exports: one mechanism to build and operate, not two.
- Build effort is medium: the queue, workers, S3 upload, and notification wiring are well-understood primitives with mature libraries, but they still require deliberate work around idempotency, dead-letter handling, and autoscaling.
Option C is deferred, not discarded.
- At current scale it adds a new failure surface for marginal benefit.
- At 10x volume, the calculus flips: worker fleets become expensive to right-size and operate, cold-start latency matters less when jobs are long-running, and Step Functions’ built-in retry, state management, and observability start paying for themselves.
- Build effort is medium-high: it adds a new execution environment to learn and configure (Lambda packaging, Step Functions state machine definitions, EventBridge rules), introduces IAM surface area, and requires the team to reason about cold starts, Lambda runtime limits, and distributed tracing across managed services, none of which come for free even when the infrastructure is managed.

N — Risks and what to validate first.

You turn the agent loose on its own design, one hostile lens at a time.

Stress-test this design as four reviewers in turn — a security reviewer, an SRE, a cost
owner, and a future maintainer. For each, name the failure mode I would most regret
ignoring.

The lenses surface what a single perspective misses:

Security reviewer: S3 artifacts hold tenant data — short-TTL signed URLs, encryption at rest, tenant-scoped keys so tenant A can never fetch tenant B’s export.
SRE: dead-letter queue, alerts on backlog depth and failure rate, bounded retries, and a stuck job that must not block the queue.
Cost owner: autoscale workers on queue depth, an S3 lifecycle rule to expire artifacts after 7 days, and the read replica as the main recurring cost to justify.
Future maintainer: one job abstraction for both features, a documented idempotency contract, and EventBridge or cron for scheduling rather than a hand-rolled scheduler.

From there, the risks to retire first — and the cheapest way to retire them:

Duplicate scheduled sends (scheduler fires twice, or a worker retries mid-send): idempotency keys per (schedule_id, period), at-least-once delivery plus dedupe.
Queue backlog starving scheduled jobs during peak hours: separate queues and priorities for interactive versus scheduled work.
Memory blowup on the 1M-row export: stream to S3 in chunks, never buffer the full result set in memory.

Validate the cheapest thing first: spike one worker generating a 1M-row CSV streamed to S3, measure wall-clock time and peak memory, and commit to the full design only once those numbers are in.

P — Migration and rollout.

You delegate the draft; you own the rollback trigger.

Draft the rollout: a strangler-fig migration behind a feature flag from the synchronous
path to the async one, then canary for every change after. Include the rollback trigger
and the telemetry that gates each expansion.

What it drafts — a one-time migration, then safe releases afterward:

Strangler-fig is named after a vine that grows around an existing tree, gradually replacing it without ever cutting it down. In software, it means introducing a new path alongside the old one, routing traffic to it incrementally via a feature flag, and retiring the old path only once the new one has proven itself. The two endpoints coexist for the entire duration of the migration.

flowchart LR
    REQ([Export request]) --> FF{Feature flag}
    FF -->|async criteria met| ASYNC[New async endpoint]
    FF -->|not yet migrated| SYNC[Old sync endpoint]
    ASYNC --> Q[(Queue)] --> W[Workers] --> S3[(S3)] --> N[Notify user]
    SYNC --> DB[(Postgres)] --> R[Render inline] --> HTTP[HTTP response]
    style ASYNC fill:#d4edda,stroke:#28a745
    style SYNC fill:#ffd7d7,stroke:#dc3545

The feature flag criteria expand over time: first large tenants only, then most tenants, then all of them. Once telemetry shows the async path covers every case, the sync endpoint is deprecated and removed. That is the strangler-fig complete.

Canary deployment takes over from that point. It is not about replacing one system with another; it is about releasing a new version of the same system safely. A small percentage of traffic goes to the new version while the rest stays on the stable one. If error rates and latency stay within bounds, the rollout expands. If not, traffic is routed back before the problem reaches most users.

flowchart LR
    Q[(Job queue)] --> CR{Canary router}
    CR -->|5 percent| NW[v2 workers]
    CR -->|95 percent| SW[v1 workers]
    NW --> M[Monitor errors and latency]
    M -->|healthy| EXP[Expand to 25, 50, 100 percent]
    M -->|unhealthy| RB[Rollback to v1]
    style NW fill:#fff3cd,stroke:#ffc107
    style SW fill:#d4edda,stroke:#28a745
    style RB fill:#ffd7d7,stroke:#dc3545
    style EXP fill:#d4edda,stroke:#28a745

The two patterns are sequential, not alternatives. Strangler-fig is the one-time migration to Option B. Canary is how every subsequent change to Option B ships once it is live.

Applied to Acme.ai:

Keep the existing synchronous endpoint untouched for small and legacy exports.
Introduce the async path behind a new endpoint and a feature flag.
Route the largest tenants through async first, where the pain is greatest.
Migrate the rest as telemetry builds confidence.
Deprecate the synchronous path only once the data shows async covers the cases.
From that point on, all changes to the worker code ship through canary.

Use AI as a thinking partner, not an answer machine

Let the model draft the option space and surface things worth considering, then apply your own layer on top: pruning what does not fit the context, rejecting what looks plausible but breaks under pressure, and sharpening the reasoning behind what remains. This is where the actual engineering work happens.

Think of yourself as the director and the agent as a minion: an avatar that is only as good as its harness — the rules, commands, skills, hooks, automated checks, linters, unit tests, and documented ADRs it runs inside. That harness has to carry your team’s tribal knowledge, plus your own local knowledge, preferences, and tools. The better equipped the minion, the more accurately and efficiently it reads an entire codebase in seconds and drafts a dozen alternatives in minutes — even if it does not know which of them matters here. Sometimes a good enough harness can even equip it to tell the difference.

Let it drive the mechanical work — exploring the system, enumerating approaches, drafting the rollout, stress-testing its own design through hostile lenses like a security reviewer, an SRE, a cost owner, and a future maintainer — and you keep the part that was never mechanical: which approach fits, which risk dominates, and the rationale in your own words. “Because it is a best practice” is not a reason.

The prompts threaded through the worked example above show this division in motion. Each hands the agent the generation and reserves the judgment for you — note the recurring “do not recommend one; I will choose.” The agent’s speed is not there to produce more output; it is there to compress the parts that were never the hard part, so the time you save goes into the only part that was: owning the decision well enough to defend it when someone pushes back.

Keep the fundamentals in your head

This habit is about you, not your tooling. You cannot recognize the failure mode you cannot name, so you need to know these core patterns well enough that the right one comes to mind the moment a problem calls for it:

▢Multi-tenancy isolation

Isolation must hold at the storage layer, not the application layer.
A bug or breach stops at the tenant boundary by design.
Three models: shared schema, separate schema, separate database. Each trades cost for isolation strength.
Migrating between models is expensive. Choose early.

■Data partitioning

A single node has a ceiling. Partitioning is how you exceed it.
Partition key choice determines hotspot risk.
Resharding an existing system is painful. Choose the key upfront.
Cross-shard joins are expensive. Design queries to stay within one shard.

□Service boundaries

Align boundaries to business capabilities, not technical layers.
Services do not share databases. Data ownership follows the boundary.
If a service cannot deploy independently, it is not truly separate.
A distributed call has real costs. Decompose with intention.

▶Strangler-fig migration

Never replace a live system all at once. Route traffic incrementally.
Old and new paths coexist during the migration.
A feature flag keeps each step reversible.
Deprecate the old path only when telemetry confirms the new one covers every case.

↻Idempotency

In a distributed system, messages arrive more than once. Design for it.
An idempotency key turns a duplicate into a no-op.
Strict ordering is expensive. Question whether you truly need it.
At-least-once plus dedupe achieves the same result as exactly-once at lower cost.

△Caching strategies

Cache-aside, write-through, and write-behind differ in how they handle staleness.
Cache invalidation is the hard problem: which entries become wrong when data changes?
A cache returning stale data confidently can be worse than no cache at all.
Set TTLs deliberately, not as defaults.

⚖Consistency vs availability

In a network partition, you cannot have both full consistency and full availability.
Most business logic tolerates eventual consistency if the user-visible outcome holds.
Strong consistency is expensive. Quantify the need before requiring it.
Choose based on what failure looks like to the user, not what feels safer in code.

The point is not novelty. It is recall: reaching for the right pattern the instant it applies, instead of designing past it because you forgot it existed.

Applied to Acme.ai, this decision drew on:

□Service boundary decomposition

Generation moved from the web request path into a dedicated worker service.
The worker owns its queue consumer, query logic, and S3 streaming.
Web server now only enqueues and returns immediately.

↻Idempotency and event ordering

Scheduled sends use idempotency keys keyed on (schedule_id, period).
At-least-once delivery assumed; duplicates absorbed by the consumer.
Prevents double-sends when the scheduler fires twice or a worker retries.

△Caching and staleness

Report results can be cached with a TTL tied to data freshness.
A cache hit avoids re-running the query for the same parameters.
TTL must reflect how stale exported data can acceptably be.

▶Strangler-fig migration

Old synchronous endpoint stays live for legacy and small exports.
Async path introduced behind a feature flag, routed by tenant size.
Deprecation only once telemetry confirms the new path covers every case.

▢Multi-tenancy isolation

S3 artifacts scoped per tenant with short-TTL signed URLs.
Tenant-scoped keys ensure tenant A cannot fetch tenant B's export.
Access control enforced at the storage layer, not the application layer.

⚖Availability over strict consistency

At-least-once delivery plus idempotency keys chosen over exactly-once.
Exactly-once across a queue and email provider is expensive and fragile.
Same user-visible result achieved at significantly lower complexity.

Later, in the capture habit, these same patterns are encoded into agentic harnesses: CLAUDE.md files, ADRs, and agent skills that lock in the decisions already made so an agent does not re-derive or contradict them across sessions. But that comes after. This habit is the recall in your own head that lets you judge what the agent produces in the first place. If you cannot name the pattern, no context file will tell you the agent reached for the wrong one.

Defend every decision before anyone asks

This is not only about the final recommendation. Apply it to every choice in the template: the constraints you assumed, the options you admitted, the recommendation itself, and the risk mitigations. For each one, answer “why not the obvious alternative?” out loud. If you cannot, you have found a gap. Either change the decision, or write down the justification. A decision you cannot defend is one you do not yet fully understand. Real systems interrogate you eventually, through an incident, a review, or a successor asking why it was built this way. It is cheaper to face the question while you can still change the answer.

Applied to Acme.ai:

Why not C (serverless)? Runtime and size ceilings against the 1M-row case, plus a new ops surface for little gain at today’s volume. Worth revisiting at 10x volume or if the team wanted zero worker management.
Why not just the boring Option A? It buys time but does not enable scheduled reports and still couples generation to live infrastructure. It is a six-month patch, not a two-year answer. That said, the read replica from Option A ships immediately as a stopgap.
Why at-least-once plus dedupe instead of exactly-once? Exactly-once across a queue and an email provider is expensive and fragile; idempotency keys deliver the same user-visible guarantee far more cheaply.

Capture the decision so it compounds

A decision you can defend is worth little if it evaporates the moment you move on. A choice that lives only in your head, or in one buried thread, has to be rediscovered by the next person and re-derived by the next agent. The goal is to put it where it travels: into artifacts that are versioned, discoverable, and loadable by both humans and agents.

Three artifact types carry a decision forward.

Architecture Decision Records (ADRs) capture the reasoning behind a choice, not just the verdict. An ADR records the context, the options considered, why the chosen option won, and the conditions under which the decision should be revisited. It answers “why was this built this way?” at the moment when the reasoning is freshest, so no one has to reconstruct it from git blame six months later. The most valuable part is what was rejected and why: it stops engineers and agents from re-proposing alternatives that were already evaluated.

A minimal ADR structure:

Title: a short present-tense description of the decision (e.g., “Use async job queue for report generation”)
Status: accepted / proposed / deprecated / superseded by ADR-NNN
Context: the problem, the constraints, and the forces at play
Decision: what was chosen and the core reasoning
Consequences: what becomes easier, what becomes harder, and under what conditions this should be revisited

Fitness functions turn architectural constraints into automated checks that fail loudly when violated. Where a unit test asks “does this function return the right value?” a fitness function asks “does this system still behave the way we decided it should?” They make ADR constraints enforceable rather than advisory. Without them, the ADR is documentation. With them, it is architecture. Concrete examples for Acme.ai:

a CI test that fails if any interactive export takes longer than 5 minutes,
a linter rule that rejects any new synchronous file-generation path in the web server,
a monitor that alerts when queue depth exceeds the threshold that implies SLA breach,
a compliance check that confirms S3 artifacts carry the correct TTL lifecycle policy.

Agent context files (CLAUDE.md) tell Claude Code how a specific part of the codebase works and what decisions have already been made. Claude Code loads CLAUDE.md files automatically at the root level and within each subsystem directory, scoping context to where the agent is working. A useful CLAUDE.md encodes:

the architectural decisions that govern this subsystem, with links to the relevant ADRs,
the constraints the agent must never violate (e.g., “never buffer more than 10MB before streaming to S3”),
the idempotency contracts, naming conventions, and deployment assumptions,
an explicit “do not” section recording the alternatives the team has already rejected.

This is what prevents an agent from proposing the exact design that was debated and discarded three sprints ago.

Organizing these in a Claude Code project:

acme-ai/
├── CLAUDE.md                           # project-wide agent context
├── docs/
│   └── decisions/
│       ├── ADR-001-async-export-queue.md
│       ├── ADR-002-s3-artifact-storage.md
│       └── ADR-003-strangler-fig-rollout.md
├── src/
│   ├── api/
│   │   └── CLAUDE.md                   # api: enqueue only, return 202, no generation
│   └── workers/
│       ├── CLAUDE.md                   # worker: idempotency contract, streaming rules, size limits
│       └── export/
└── tests/
    └── fitness/
        ├── test_export_time_budget.py  # fails if any export exceeds 5 min
        ├── test_queue_depth_alert.py   # validates alerting thresholds
        └── test_artifact_ttl.py        # confirms S3 lifecycle policy is in place

The root CLAUDE.md carries the project-wide decisions: the async architecture, which services own which responsibilities, and what the team has explicitly ruled out. Each subsystem CLAUDE.md goes deeper: the workers/ file tells the agent to stream to S3 in chunks, use (schedule_id, period) as idempotency keys, and never add synchronous generation back to the request path. The fitness tests run in CI and make the ADR constraints measurable.

Applied to Acme.ai, the decision leaves behind:

docs/decisions/ADR-001-async-export-queue.md: records why Option B won, what Option C would require to become the right answer, and why Option A was rejected as a long-term solution.
tests/fitness/test_export_time_budget.py: fails in CI if any export exceeds 5 minutes.
tests/fitness/test_artifact_ttl.py: confirms the S3 lifecycle policy expires artifacts after 7 days.
src/workers/CLAUDE.md: tells the agent to stream in chunks, use (schedule_id, period) as idempotency keys, and never route generation back through the web server.

Solve it once, and you have solved it once. Capture it, and you have raised the floor for everyone who comes after.

The work that remains

Strip away the tooling and the trend pieces and the shift is simple. What AI does well is produce plausible artifacts. What remains is everything around them: deciding what to build, defending why, and making the decision durable enough that the next person and the next agent inherit it rather than rebuild it.

None of the five habits is new. Engineers have always structured decisions, weighed trade-offs, leaned on fundamentals, defended their choices, and written them down. What changed is the leverage. When the artifact takes seconds to generate, the judgment around it stops being one input among many and becomes the input. The engineer once valued for how much they could build is now valued for how well they can decide, and for how reliably those decisions hold across a team and across sessions.

That is a more demanding role, not a smaller one. It rewards exactly the things that were always hard and never automatable: clear reasoning, honest trade-offs, and the willingness to own an outcome an agent helped produce. Build those habits, and build the rails that let agents work inside them, and the speed AI brings becomes yours to direct instead of something to chase.

How AI and Agentic Development Are Reshaping Software Engineering

How hiring is adapting

The method

Structure the decision with a repeatable template: UP/DowN

Use AI as a thinking partner, not an answer machine

Keep the fundamentals in your head

Defend every decision before anyone asks

Capture the decision so it compounds

The work that remains

Beyond the spec sheet: which local model you ship

Running a coding model locally with Ollama

Claude Code ships the batteries, Pi hands you the wires

How AI and Agentic Development Are Reshaping Software Engineering

Fixup commits: the clean way to fix past mistakes

Putting Claude Code minions to work

From Docker build to AWS in 5 commands

The right way to authenticate with AWS

How I built a blog that writes itself

One command to launch my entire dev environment