AI Toolset for Software Architects (Q1 2026)

26 February 2026 (Updated: 07 March 2026), 17 min read

Artificial intelligence is now embedded in day-to-day architecture work. The question is no longer which AI tool to adopt — it is how to curate, evolve, and integrate a continuously updated toolset across the full software development lifecycle. Since our Q3 2025 edition, the landscape has shifted in meaningful ways: we added two categories (Planning & Project Management and Quality Assurance / Continuous Integration), one option we expected to lead (OpenAI Codex) underdelivered in practice, and our prompting approach matured into a discipline that cuts across everything we do.

This post covers the current state of our AI toolset as of February 2026, highlights what changed, and shares the lessons we picked up along the way. If you are new to the series, the Q3 2025 edition covered four categories — Searching & Learning, Brainstorming & Ideation, Prompt Engineering, and Agentic Coding & Prototyping. Here is what shifted since then.

How the Architect's Role Continues to Evolve with AI

Four principles guide how we build and maintain our toolset:

Build a complementary toolset, not one perfect tool. No single AI covers everything. Combine tools that cover research, design, planning, coding, and quality — each doing what it does best.
Treat your toolset as continuously evolving. Adopt an experimentation habit rather than searching for a stable, final configuration. What works today may be outpaced in a quarter — or sometimes within a week. For example, once Skills gained broad attention as a reusable capability standard, MCP (Model Context Protocol — a standard for connecting AI agents to external tools and data sources) usage shifted away from being a local tooling accelerator and toward a clearer role: bridging agents to remote services and live data.
Leverage AI across all lifecycle stages. From early research and design through planning, implementation, and into maintenance — there is no phase where AI cannot accelerate or augment your work.
Capture and operationalize shared knowledge. Decisions made in the morning should inform work done in the afternoon. Meeting transcription and insight extraction are key parts of this process. Maintaining up-to-date documentation is essential for AI to provide accurate, relevant assistance.

What Changed Since Q3 2025

The most significant structural change in our toolset is that Prompt Engineering is no longer its own category. It has become a metacategory — a cross-cutting discipline that shapes how we interact with every other tool. Good prompting is now an integral part of all our AI workflows, not a specialized skill in one workflow stage. (See Prompt Engineering as a Metacategory below for what this means in practice.)

Equally significant: we expanded into two new areas that were missing from the previous edition:

Planning & Project Management — we started actively using AI to support backlog grooming, task decomposition, and integration with project management tools.
Quality Assurance / Continuous Integration — we now use AI-assisted hooks, code review automation, and browser testing tools as part of our CI feedback loops.

The five-category model we use now better reflects where architects actually spend time.

AI Tool Categories — Q1 2026

1. Searching & Learning

Perplexity.ai remains our primary tool for quick search across sources, exploring new technologies, and summarizing recent research. What has changed is how we use it: increasingly via its API inside more complex, multi-step workflows rather than as a standalone chat interface.

Obsidian is where this research ultimately lands, and Markdown is now our default documentation language. That decision is mostly practical: Markdown is the native format we use to communicate with LLMs, and it handles prose, code snippets, and diagram-as-code (for example Mermaid) equally well in one place. In practice, this gives us a better AI interface than wiki-style pages such as Confluence, which are harder to keep structured, portable, and model-friendly at scale. Every research thread ends as a curated Markdown knowledge-base entry with clear explanations, practical analogies, best practices, and verified examples. We then connect this local knowledge base back into AI chats via local MCP/RAG, creating a continuously improving feedback loop between daily work and future AI sessions.

A typical pattern looks like this:

Use Perplexity API to search and identify relevant sources on a topic.
Pipe results into Firecrawl to scrape and extract full content from the most promising pages.
Summarize and synthesize using an LLM (e.g., Claude) to extract key insights from the aggregated content.
If the topic is generic enough to be reusable, create a structured Markdown entry in Obsidian with the synthesized insights, examples, and references.

This pipeline makes AI-powered research significantly more thorough and reproducible than ad-hoc searches. It is especially effective when you need to validate architectural assumptions against current documentation, vendor blogs, and community discussions simultaneously.

Over time, these entries become curated, personalized training content for the team. We loop that context back into future AI sessions so new prompts are grounded in our language, decisions, and mental models rather than generic internet summaries.

During research, we also ask AI to explain complex problems and architectures visually using Mermaid diagrams. Mermaid works especially well with Markdown-native editors because diagrams are rendered directly from source, without the extra PNG/SVG export step, which keeps them easy to update as ideas evolve. In practice, we use it for multiple diagram types: flowcharts, sequence diagrams, state diagrams, and C4 diagrams (context, container, component, and deployment views). Because these notes live in Git, diagram changes can be reviewed in normal pull requests, where team members comment on specific lines and propose improvements exactly as they do for code.

2. Brainstorming & Ideation

Chat interfaces continue to work well for early-stage thinking. We use both ChatGPT and Claude (the chat interface, not Claude Code) depending on the task — neither has displaced the other in this category. What matters more than the model is the quality of the prompt and the clarity of what you are trying to explore.

Brainstorming use cases remain largely unchanged: generating design alternatives, stress-testing assumptions, exploring edge cases before committing to a direction, and facilitating design discussions when you want a second (non-human) perspective.

What has improved is our capture-and-review loop. The difference is concrete. Before: chat logs from design discussions and meeting notes sit in Slack threads or personal notebooks, slowly going stale and disconnected from the codebase. After: AI converts those same transcripts and chat outputs into structured Markdown documentation with Mermaid diagrams that live in the repo, are version-controlled, and stay current as the design evolves. In practice, this means a 30-minute architecture brainstorm now produces a pull request with updated context diagrams and decision rationale — not a Slack message that three people will search for next week and nobody will find.

3. Planning & Project Management

This is a new category in our toolset — and one of the more nuanced additions.

For smaller projects, we prefer simple Markdown-based backlogs and task lists maintained in the repository. They are lightweight, version-controlled, and work well with AI tools that can read and update them directly.

For larger projects involving multiple team members, we use the Atlassian MCP to interact with Jira within Claude Code sessions. However, there is an important practical caveat: the Atlassian MCP is token-heavy and consumes significant context. Our recommendation is to avoid running it continuously. Instead, adopt a sync-work-sync pattern:

Sync Jira → Markdown (pull the current state into local files).
Do the planning and task management work in Markdown.
Sync Markdown → Jira (push updates back) in a separate session.

This keeps your AI sessions focused and avoids flooding the context window with Jira API overhead.

A broader observation: Jira is becoming less useful for engineering teams. In practice, it primarily serves project managers who need visibility — but it adds friction for engineers. We are actively exploring lighter alternatives, starting with Markdown-based backlogs in Git repositories, and watching Linear as a potential replacement. Some of the more technical Project Managers already started to use Claude Code to "ask" Markdown-based backlogs for status updates and to provide updates to stakeholders — this is a much more efficient workflow than navigating Jira's UI for the same purpose.

In practice, this category now consumes more time than coding in early phases of a project. We use AI to shape clearer roadmaps and produce more detailed milestones and task definitions that are implementation-ready for the team.

This up-front decomposition is essential for AI-first delivery: when milestones and tasks are explicit, scoped, and well-sequenced, execution accelerates dramatically. We intentionally invest more effort at the beginning to align on the problem and solution space before writing code.

4. Coding and Prototyping

Claude Code remains our frontier tool for agentic coding. After a brief experiment with OpenAI Codex and Augment Code, we returned to Claude Code — it remains ahead in terms of agentic behavior, context handling, and integration ecosystem.

Our current focus is on building skills and subagent libraries for specific projects and testing how those building blocks scale in larger teams (3+ engineers), with mixed but promising results so far. Rather than relying on general-purpose prompting (or off-the-shelf prompt templates and frameworks), we invest in project-specific, high-quality assets:

Skills: reusable, task-specific prompt packages (slash commands) that encode domain knowledge, constraints, and preferred workflows for a given project context. Think of them as documented expert procedures that any agent session can load. A concrete example is a Docker image validation skill focused on checking images against container best practices and verifying base-image choices. Skills like this often bundle scripts and validation logic, giving agents stronger guardrails than prompt text alone.
Subagents: specialized Claude Code agent instances scoped to a focused task — for example, a Domain-Driven Design expert that helps identify bounded contexts, a Code Reviewer tuned to a specific language or framework, or a Trade-off Analyst that evaluates architectural options against defined criteria.

The distinction matters: a skill is a single-purpose, reusable prompt package, while a subagent is a scoped execution context, that can run in scaled sessions and be orchestrated with other subagents. Skills are what subagents follow.

Once the library of skills and subagents is in place, we execute project backlogs using teams of subagents operating with those specific skills. This dramatically improves consistency and reduces the per-task prompting overhead.

We continue to use a small, curated set of MCPs:

MCP	Purpose
Serena	Codebase querying — works especially well with larger codebases
Context7	Documentation browsing during coding sessions, more here
Sequential Thinking	Structured planning and multi-step reasoning

Keeping the MCP list small and intentional is a deliberate choice. Each MCP adds context overhead and potential noise — more is not always better (we are aware of Claude's MCP auto-discovery feature, but managing context explicitly has proven more reliable).

We also maintain a few cross-project skills that multiple subagents share, such as (but not limited to):

Documentation structure: keeps commercial and open-source documentation aligned to a shared, easy-to-navigate structure, with explicit documentation-map files that make content simple for both humans and LLMs to discover and traverse.
Documentation style guide: enforces a consistent writing style inspired by the Microsoft Writing Style Guide, implemented as a practical skill split into categories with English examples.
Mermaid diagram skill: translates natural-language architecture and ADR ideas into the most appropriate Mermaid diagram type and applies consistent visual conventions (shapes, colors, and notation) across repositories.

5. Quality Assurance / Continuous Integration

This is the other new category, and it is one of the most promising areas for architects who want to influence engineering standards at scale.

Claude Code Hooks are our first "tool" here. Hooks are lifecycle event handlers in Claude Code that can execute deterministic checks at specific points in the agent workflow. A practical example: attach a script to the Stop event (or SubagentStop if you are working with subagent orchestration) that automatically runs tests and linters after each agent completes its task. This turns AI-generated code into code that is continuously validated — not just generated.

GitHub Copilot for code review has improved noticeably. AI-assisted code review now covers basic quality checks reliably, and we are increasingly comfortable using it as a first pass before human review — particularly for generated code, configuration files, or changes that are high-volume but lower-risk. We still keep humans in the loop for anything critical or architecturally significant.

Chrome DevTools MCP rounds out this category. It provides a simple but effective way to give AI visibility into web application behavior — inspecting the DOM, checking network requests, and validating rendering — without leaving the agent workflow. For web-focused projects, this closes a gap that previously required manual browser inspection.

We are exploring how to use AI to automatically validate whether code stays consistent with architectural decisions and documentation. Today we do this periodically as part of our QA process, but the goal is tighter integration — running architectural consistency checks directly in CI pipelines so that drift is caught early, not during a quarterly review. This connects directly to principle #4: if architectural decisions and documentation are kept up to date and in AI-friendly formats, automated validation becomes practical rather than aspirational. Early experiments are promising, though the challenge remains defining "consistency" precisely enough for an AI check to be actionable without generating noise.

Prompt Engineering as a Metacategory

In Q3 2025, Prompt Engineering had its own category. It no longer does — not because it matters less, but because it matters everywhere.

Good prompting is now the backbone of how we interact with every tool in the list. The principles that emerged from dedicated prompt engineering practice — clear context, well-scoped tasks, explicit constraints, iterative refinement — apply equally to Perplexity searches, Jira syncs, Claude Code subagent configurations, and QA hooks.

At the same time, this is not a rigid "ways of working" template. Prompt engineering is iterative, dynamic, and continuously adjusted as project context changes. The process is rarely linear or fully repeatable; it is a creative collaboration where human judgment remains essential.

Treating prompting as a metacategory is also a signal to teams: it is an engineering discipline, not a soft skill. It deserves the same investment as writing good tests or designing good APIs.

To Explore

The toolset never stands still. Here is what is on our radar for the coming quarter:

Linear — piloting as a lighter replacement for Jira, with a developer experience that better fits engineering-first teams.
Brave Search — evaluating as an alternative or complement to Perplexity for searching and learning, particularly for privacy-sensitive research contexts.
Playwright CLI with AI — exploring AI-assisted testing capabilities through Playwright's CLI interface.
GitHub Agentic Workflows (preview) — build GitHub Workflows with Markdown rather than YAML only. This is a promising direction to use CI in more ways.

Our Current Thoughts on Working with AI Tools

After another quarter of hands-on work, four things stand out as field notes from the trenches:

AI accelerates thinking — it does not replace it. AI is a force multiplier for engineering judgment, not a substitute. The more architectural maturity you bring to AI collaboration, the better your outcomes. Architects who lean on AI to skip the thinking tend to get mediocre results faster. (For more on the foundational skills that make this work, see our post on the Staff Engineer Toolkit.)

Prompting is an engineering discipline. Own the context. We consistently observe a significant difference between sessions that have too little context (under-informed, generic output) and sessions that have too much context (noisy, unfocused output). There is no silver bullet — but the habit of researching and documenting context close to the task before starting an AI session pays dividends every time.

AI amplifies both good and bad practices. If your processes are sloppy, AI will make them sloppier at scale. If your architectural thinking is sharp, AI will amplify it. This is perhaps the strongest argument for investing in learning and foundational understanding: in a landscape where implementation details can be delegated to AI, the competitive advantage lies in understanding the why behind those details — well enough to guide, review, and correct the AI's output.

Automated information flow and collection is key. Whether it's research, documentation, or test case generation, the quality of AI output depends critically on the information it has access to. Automated (or semi-automated) collection of up-to-date information from multiple sources (meeting transcripts/notes, Architecture Decision Records, Confluence pages, Roadmaps, changelogs, etc.) and transforming it into consistent AI-friendly formats is essential.

Closing Principles

There is still no final or perfect AI toolset for software architects. Quarter by quarter, the practice comes down to four durable commitments:

Continuous evolution: adopt, test, and retire tools as needs change. One quarter's frontier tool may be the next quarter's baseline — OpenAI Codex being a fresh example from this edition.
Balanced coverage: ensure the toolset spans research, ideation, planning, coding, and quality. Gaps in any stage create bottlenecks that slow the whole team.
Architect mindset first: tools augment judgment; they do not replace it. The most valuable thing you bring to any AI collaboration is clear thinking about what you actually need and why.
Shared knowledge as infrastructure: capture decisions, research, and context in formats that both humans and AI can use. The quality of AI output depends on the quality of information it has access to.

The architects who thrive in this environment are not those who adopt the most tools — they are the ones who build the habit of thoughtful experimentation and honest evaluation, one quarter at a time.

The following is a terminology note for readers less familiar with how "agent" and "agentic" are used in this post and in the broader industry.

Agent or Agentic

"Agent" and "agentic" are among the most overloaded terms in the AI space — right alongside "artifact" or "service." They mean very different things depending on context, and the lack of consensus creates real confusion. Here is how the ambiguity plays out in practice:

"Agent" as a runtime orchestrator — tools like Claude Code that autonomously plan, execute, and iterate on multi-step tasks. This is the strongest sense of the word: an agent that reasons about what to do next and acts on it.
"Agent" as a scoped prompt package — what many teams call a "code review agent" or "testing agent" is often just a well-crafted prompt with specific instructions, not an autonomous system. In our toolset, we call these subagents to avoid the confusion.
"Agent" as a single instruction — in some contexts, even "write tests for this code" qualifies as an "agent," stretching the term to near-meaninglessness.
"Agentic" as a behavioral property — describes a workflow or tool that exhibits autonomy, tool use, and multi-step reasoning, as opposed to a single request-response exchange. A coding session where the AI reads files, runs tests, and iterates on failures is agentic; a one-shot code generation prompt is not.

In this post, when we say subagent we mean a Claude Code agent instance scoped to a focused task and orchestrated as part of a larger workflow. When we say agentic, we mean a workflow where the AI operates with autonomy across multiple steps. We avoid using "agent" on its own precisely because of the ambiguity above.