A field guide to the AI landscape · June 2026 · from knowledge retrieval to autonomous execution
AI is shifting from reactive (chatbots that answer questions) to autonomous (systems that do the work). For strategy and operations, three shifts matter most:
What it is: A method for organizing human knowledge as a web of interconnected, plain-text notes rather than rigid folders. The methodology was named and popularized by Tiago Forte ("Building a Second Brain," course 2017, book 2022); tools like Obsidian, Notion, and Roam are implementations of it.
Strategic value: Most company knowledge is scattered, or locked in people's heads. Before AI can use it, it has to be written down and connected. Clean, linked notes are the raw material that lets AI understand how the organization actually works.
Read more: Building a Second Brain (Tiago Forte) →
What it is: Retrieval-Augmented Generation. Out of the box, a model only knows the public data it was trained on. RAG acts like an open-book exam: when a question comes in, the system searches the company's secure knowledge base, grabs the relevant pages, and hands them to the AI to read before it answers. The technique was introduced by Patrick Lewis and colleagues in a 2020 research paper.
Strategic value: It grounds AI in factual reality, reducing hallucination. Deployed in a controlled environment, it lets a business query proprietary data without exposing it to public models.
Read more: the original RAG paper (Lewis et al., 2020) →
What it is: The context window is the AI's working short-term memory in a single session, measured in tokens. Providers are racing to enlarge it (up to ~2 million tokens), but loading that much text creates a trap: you pay for every token in the window every time the model takes a step, so costs climb fast. And stuffing the window too full backfires: the model starts missing details buried in the middle — an effect called "context rot" that shows up well before the window is even full.
Strategic value: Managing the context window is an economic necessity. Shoving entire files into the chat box on every query is a financially unsustainable way to run AI at scale.
Read more: Chroma's "Context Rot" study (18 frontier models) →
What it is: The technology that turns AI from an amnesiac chatbot into a continuous worker. Instead of dragging whole documents into every session, advanced systems use compaction — extracting the core facts, caching them to a long-term store, and wiping the active window to save cost.
The frontier: Anthropic's "Dreaming" (a research preview) has an idle agent review its own past session logs, merge duplicates, resolve contradictions, and reorganize its memory files — improving the agent's memory, not the underlying model weights.
Strategic value: You can't scale AI if it re-reads the entire knowledge base on every question. Memory and compaction let the system get sharper over time while keeping the bill in check.
Read more: Anthropic introduces "dreaming" for agent memory →
What it is: An open, universal connectivity standard introduced by Anthropic in late 2024 — the "USB plug" for AI. Historically, connecting AI to Google Drive, Slack, or a SQL database meant building a custom, fragile integration for each one. MCP standardizes how any model connects to any secure data source or tool, and has since become one of the fastest-adopted open standards in software.
Strategic value: It removes integration friction — plugging AI into existing infrastructure in hours rather than months, and cutting deployment cost.
Read more: Introducing the Model Context Protocol (Anthropic) →
What it is: Once agents can read files, send email, and trigger workflows, behavioral guardrails aren't enough — you need security architecture. Every serious enterprise agent needs a distinct identity, role, access policy, and audit log. This is different from guardrails: guardrails shape behavior; permissioning controls reach. (Microsoft, Okta, and others are now shipping dedicated "agent identity" products.)
Strategic value: A sales agent should not see payroll; an assistant agent should not approve payments above a threshold. This is where AI crosses from a useful tool into a governed operating model.
Read more: Microsoft Entra Agent ID (identities for AI agents) →
What it is: Where MCP connects an agent to tools, A2A connects agents to each other — letting independent AI systems discover, negotiate, and delegate work. It runs on Agent Cards: standardized profiles (think an enterprise LinkedIn for AI) advertising an agent's skills, credentials, and access. A2A was launched by Google in 2025 and donated to the Linux Foundation for neutral, multi-vendor governance.
Strategic value: It prevents vendor lock-in. A support agent from one vendor can read the Agent Card of an accounting agent from another, verify its credentials, and hand off a task.
Read more: Google donates A2A to the Linux Foundation →
What it is: A skill is a self-contained folder — instructions plus optional scripts and reference files — that an agent loads only when a task calls for it. Anthropic released Agent Skills in late 2025 and opened it as a standard; the agent pre-loads just each skill's name and one-line description, then pulls in the full content on demand ("progressive disclosure").
Strategic value: Skills give an agent deep, repeatable competence (e.g., "fill out our compliance report the right way") without permanently bloating its context — directly addressing the Loading Dilemma in §3. They make capability modular, shareable, and auditable.
Read more: Equipping agents with Agent Skills (Anthropic) →
What it is: Taking the output of one model and feeding it to another, with each playing a role — one drafts, another critiques, another fact-checks, another formats. It treats models as an advisory bench rather than a single oracle.
Strategic value: It's how power users get higher-quality, cost-optimized output today (a cheap model formats; an expensive reasoning model does the hard logic) — and it's the manual precursor to automated orchestration. The one caution: after several hand-offs it gets easy to lose track of which model made which claim, so chained work needs source-tracking and human judgment.
Read more: Building Effective Agents — the prompt-chaining pattern (Anthropic) →
What it is: The shift from prompt-driven AI (waiting for a human) to event-driven AI. Always-on agents run 24/7; when an event fires — a vendor invoice lands, code is merged — a lead agent spins up a temporary digital team of parallel subagents to research, verify, and synthesize, then hands the result back for approval.
Strategic value: This moves the payoff from "saved an hour of typing" to "ran a whole process start to finish." Specialization also helps accuracy — a few focused agents beat one model trying to do everything.
Read more: Seizing the agentic AI advantage (McKinsey) →
What it is: The shift from AI working through code back-ends (APIs) to AI operating the user interface directly — seeing the screen and driving mouse and keyboard like a person. The breakthrough capability was pioneered by Anthropic's Computer Use (with Claude 3.5 Sonnet) and brought to consumers by OpenAI's Operator in early 2025.
A note on OpenClaw: A widely-cited example here is OpenClaw — but it's worth being precise: it's an open-source, always-on personal agent (created by Peter Steinberger; it wires a model into apps like Gmail, Slack, iMessage), not the originator of mouse-and-keyboard computer use. It's a good illustration of consumer agents — and, given its documented security vulnerabilities, also a cautionary tale (see §16).
Strategic value: AI no longer needs a custom API for every system. If a human can do it through a browser or desktop app, an agent can operate it — which unlocks automation of legacy software that was never built to be integrated.
Read more: Computer Use vs. Operator, compared (WorkOS) →
What it is: Demos look magical, but the frontier is no longer "can AI do it?" — it's "can AI do it reliably, repeatably, securely, and accountably?" Enterprise deployment lives or dies on repeatability, audit trails, fallback paths, and failure handling. (By one widely-cited 2025 MIT study of 300+ initiatives, only a small fraction translate pilots into measurable P&L impact.)
Strategic value: The honest takeaway: AI can automate more than expected — but only when it's wrapped in systems that make failure visible and controllable.
Read more: the disciplines separating demos from deployment (VentureBeat) →
What it is: A model is just a raw engine. A harness (or scaffolding) is the runtime built around it — the loop that decides how the model gets context and tools and how it iterates. It's worth separating three things people often blur: the harness (the runtime loop), guardrails (behavioral/policy limits), and evals (measurement, §14). Modern harnesses also enable self-healing: hit an error, read the log, try a different approach, and retry — rather than crashing.
Strategic value: This is the reliability and risk layer that makes autonomous agents safe to deploy.
Read more: Building Effective Agents — workflows vs. agents (Anthropic) →
What it is: Automated test suites for AI output. They check whether the AI answered correctly, followed policy, used the right source, avoided hallucination, and handled edge cases — before a human ever sees the result.
Strategic value: Evals turn AI from subjective "seems good" demos into measurable systems. They are how leadership compares models, vendors, prompts, and workflows on evidence rather than vibes.
Read more: Your AI Product Needs Evals (Hamel Husain) →
What it is: The board-level questions underneath every deployment: where does our data physically go, is it used to train a vendor's model (and can we opt out), are we on an enterprise tier with the right contractual protections, and what's our exposure under privacy regimes (e.g., PIPEDA, GDPR)?
Strategic value: The difference between "AI as a productivity win" and "AI as a data-leak incident" is almost entirely a function of which environment and tier the work runs in. This decision precedes the fun parts.
Read more: the NIST AI Risk Management Framework →
Context rot: too much context degrades precision (§3).
Tool misuse & security exposure: agents sitting at the center of workflows with broad permissions are a prime target for prompt-injection and malicious extensions. OpenClaw's documented vulnerabilities — serious enough that some governments restricted its use — are the live cautionary example.
Silent failure: an agent completes a task incorrectly but confidently reports success.
Tokenmaxxing: a real, recent anti-pattern — Amazon shut down an internal AI usage leaderboard ("KiroRank") after employees ran needless agents to inflate token counts, burning compute to game the metric. The lesson (a textbook Goodhart's-Law trap): measure business outcomes, never raw AI consumption. (Yahoo Finance, heise)
Read more: the OWASP Top 10 for LLM Applications →
What it is: The overhaul of the software development lifecycle. Tools like Claude Code live inside the developer's environment, reading the codebase, hunting bugs, and deploying fixes. This fuels vibe coding — building real applications in plain English instead of syntax. (The term was coined by Andrej Karpathy in February 2025 and became Collins Dictionary's Word of the Year for 2025.) Advanced teams run parallel windows: an expensive reasoning model designs architecture in one, a cheap, fast model generates code in another.
Strategic value: It decouples software velocity from engineering headcount; business and ops staff can prototype and deploy internal tools directly. As a directional signal, Anthropic reports its own per-engineer productivity up sharply this year on the back of Claude Code, which now accounts for a meaningful share of public code commits — vendor-reported, but a real datapoint worth pressure-testing.
Read more: Boris Cherny, head of Claude Code, on what comes after coding →
What it is: The organizational consequence. Companies shift from hiring a person for every workflow step toward designing workflows where people supervise fleets of agents. Roles move from "doer" to "specifier, reviewer, exception-handler, and owner."
Strategic value: This is the leadership-level implication: AI changes spans of control, team sizes, hiring plans, vendor strategy, and operating leverage — not just per-task productivity.
Read more: The agentic organization — a new operating model (McKinsey) →
What it is: The decision framework. AI strategy isn't just "pick a model." It's choosing when to buy packaged features (Microsoft Copilot, Salesforce, ServiceNow), when to build custom agents for proprietary workflows, and when to compose systems from MCP, A2A, internal data, and best-in-class models.
Strategic value: Composition is where the highest ROI currently sits — orchestrating bespoke workflows out of off-the-shelf parts, without betting the company on building foundation models.
Read more: how 100 enterprise CIOs are building and buying gen AI (a16z) →
To turn this from a landscape into action, here are the questions any organization can ask of its own operations:
Knowledge structuring — where is internal documentation too messy for AI to read, and what gets cleaned first?
Workflow bottlenecks — where do high-volume inquiries, logistics, or vendor reconciliation stall on manual hand-offs?
Software velocity — where could ops or business teams prototype their own internal tools instead of queuing for engineering?
Governance — where is strict human-in-the-loop approval required versus full automation, and what data can never leave the environment?
Coined the "LLM OS" framing (AI as the CPU of a new operating system) and the term "vibe coding." His tutorials are the gold standard for cutting through hype. (TechCrunch, karpathy.ai)
The person to read for the AI SDLC — how agentic coding is changing what "software engineer" means. (Lenny's Newsletter)
The primary evangelist for agentic workflows — showing that an older, cheaper model inside a structured Plan → Execute → Review loop can beat a newer model used zero-shot. (The Batch)
LangGraph is among the dominant frameworks enterprises use to orchestrate multi-agent systems and manage agent state and memory. (blog.langchain.dev)
The sharpest tracker of how builders actually use these tools, and of the economics of the AI SDLC. (latent.space)
Unusually good at cutting through hype and explaining what actually works — and the leading practical voice on prompt-injection and agent security. (simonwillison.net)