AI Toolset for Software Architects (Q1 2026)

26 February 2026, 12 min read

AI architect toolset

Artificial intelligence has firmly settled into everyday architect work. The question is no longer which AI tool to adopt — it is how to curate, evolve, and integrate a living toolset that covers the full software development lifecycle. Since our Q3 2025 edition, the landscape has shifted meaningfully: two new categories were added, one contender we anticipated (OpenAI Codex) fell short in practice, and our prompting philosophy has matured into something that cuts across everything we do.

This post covers the current state of our AI toolset as of February 2026, highlights what changed, and shares the lessons we picked up along the way. If you are new to the series, the Q3 2025 edition covered four categories — Searching & Learning, Brainstorming & Ideation, Prompt Engineering, and Agentic Coding & Prototyping. Here is what shifted since then.

How the Architect's Role Continues to Evolve with AI

Three principles guide how we build and maintain our toolset:

  1. Build a complementary toolset, not one perfect tool. No single AI covers everything. Combine tools that cover research, design, planning, coding, and quality — each doing what it does best.
  2. Treat your toolset as continuously evolving. Adopt an experimentation habit rather than searching for a stable, final configuration. What works today may be outpaced in a quarter — or sometimes within a week.
  3. Leverage AI across all lifecycle stages. From early research and design through planning, implementation, and into maintenance — there is no phase where AI cannot accelerate or augment your work.

What Changed Since Q3 2025

The most significant structural change in our toolset is that Prompt Engineering is no longer its own category. It has become a metacategory — a cross-cutting discipline that shapes how we interact with every other tool. Good prompting is now an integral part of all our AI workflows, not a specialized skill in one workflow stage. (See Prompt Engineering as a Metacategory below for what this means in practice.)

Equally significant: we expanded into two new areas that were missing from the previous edition:

  • Planning & Project Management — we started actively using AI to support backlog grooming, task decomposition, and integration with project management tools.
  • Quality Assurance / Continuous Integration — we now use AI-assisted hooks, code review automation, and browser testing tools as part of our CI feedback loops.

The five-category model we use now better reflects where architects actually spend time.

AI Tool Categories — Q1 2026

1. Searching & Learning

Perplexity.ai remains our primary tool for quick search across sources, exploring new technologies, and summarizing recent research. What has changed is how we use it: increasingly via its API inside more complex, multi-step workflows rather than as a standalone chat interface.

A typical pattern looks like this:

  1. Use Perplexity API to search and identify relevant sources on a topic.
  2. Pipe results into Firecrawl to scrape and extract full content from the most promising pages.
  3. Summarize and synthesize using an LLM (e.g., Claude) to extract key insights from the aggregated content.

This pipeline makes AI-powered research significantly more thorough and reproducible than ad-hoc searches. It is especially effective when you need to validate architectural assumptions against current documentation, vendor blogs, and community discussions simultaneously.

2. Brainstorming & Ideation

Chat interfaces continue to work well for early-stage thinking. We use both ChatGPT and Claude (the chat interface, not Claude Code) depending on the task — neither has displaced the other in this category. What matters more than the model is the quality of the prompt and the clarity of what you are trying to explore.

Brainstorming use cases remain largely unchanged: generating design alternatives, stress-testing assumptions, exploring edge cases before committing to a direction, and facilitating design discussions when you want a second (non-human) perspective.

3. Planning & Project Management

This is a new category in our toolset — and one of the more nuanced additions.

For smaller projects, we prefer simple Markdown-based backlogs and task lists maintained in the repository. They are lightweight, version-controlled, and work well with AI tools that can read and update them directly.

For larger projects involving multiple team members, we use the Atlassian MCP to interact with Jira within Claude Code sessions. However, there is an important practical caveat: the Atlassian MCP is token-heavy and consumes significant context. Our recommendation is to avoid running it continuously. Instead, adopt a sync-work-sync pattern:

  1. Sync Jira → Markdown (pull the current state into local files).
  2. Do the planning and task management work in Markdown.
  3. Sync Markdown → Jira (push updates back) in a separate session.

This keeps your AI sessions focused and avoids flooding the context window with Jira API overhead.

A broader observation: Jira is becoming less useful for engineering teams. In practice, it primarily serves project managers who need visibility — but it adds friction for engineers. We are actively exploring lighter alternatives, starting with Markdown-based backlogs in Git repositories, and watching Linear as a potential replacement. Some of the more technical Project Managers already started to use Claude Code to "ask" Markdown-based backlogs for status updates and to provide updates to stakeholders — this is a much more efficient workflow than navigating Jira's UI for the same purpose.

4. Coding and Prototyping

Claude Code remains our frontier tool for agentic coding. After a brief experiment with OpenAI Codex, we returned to Claude Code — it remains ahead in terms of agentic behavior, context handling, and integration ecosystem.

Our current focus is on building skills and subagent libraries for specific projects and experimenting with using those blocks in bigger teams (3+ engineers) - with varying degrees of success so far... Rather than relying on general-purpose prompting (or off-the-shelf prompt templates and frameworks), we invest time in building project-specific, high-quality:

  • Skills: reusable, task-specific prompt packages (slash commands) that encode domain knowledge, constraints, and preferred workflows for a given project context. Think of them as documented expert procedures that any agent session can load.
  • Subagents: specialized Claude Code agent instances scoped to a focused task — for example, a Domain-Driven Design expert that helps identify bounded contexts, a Code Reviewer tuned to a specific language or framework, or a Trade-off Analyst that evaluates architectural options against defined criteria.

The distinction matters: a skill is a single-purpose, reusable prompt package, while a subagent is a scoped execution context, that can run in scaled sessions and be orchestrated with other subagents. Skills are what subagents follow.

Once the library of skills and subagents is in place, we execute project backlogs using teams of subagents operating with those specific skills. This dramatically improves consistency and reduces the per-task prompting overhead.

We continue to use a small, curated set of MCPs:

MCP Purpose
Serena Codebase querying — works especially well with larger codebases
Context7 Documentation browsing during coding sessions
Sequential Thinking Structured planning and multi-step reasoning

Keeping the MCP list small and intentional is a deliberate choice. Each MCP adds context overhead and potential noise — more is not always better (we are aware of Claude's MCP auto-discovery feature, but managing context explicitly has proven more reliable).

5. Quality Assurance / Continuous Integration

This is the other new category, and it is one of the most promising areas for architects who want to influence engineering standards at scale.

Claude Code Hooks are our first "tool" here. Hooks are lifecycle event handlers in Claude Code that can execute deterministic checks at specific points in the agent workflow. A practical example: attach a script to the Stop event (or SubagentStop if you are working with subagent orchestration) that automatically runs tests and linters after each agent completes its task. This turns AI-generated code into code that is continuously validated — not just generated.

GitHub Copilot for code review has improved noticeably. AI-assisted code review now covers basic quality checks reliably, and we are increasingly comfortable using it as a first pass before human review — particularly for generated code, configuration files, or changes that are high-volume but lower-risk. We still keep humans in the loop for anything critical or architecturally significant.

Chrome DevTools MCP rounds out this category. It provides a simple but effective way to give AI visibility into web application behavior — inspecting the DOM, checking network requests, and validating rendering — without leaving the agent workflow. For web-focused projects, this closes a gap that previously required manual browser inspection.

Prompt Engineering as a Metacategory

In Q3 2025, Prompt Engineering had its own category. It no longer does — not because it matters less, but because it matters everywhere.

Good prompting is now the backbone of how we interact with every tool in the list. The principles that emerged from dedicated prompt engineering practice — clear context, well-scoped tasks, explicit constraints, iterative refinement — apply equally to Perplexity searches, Jira syncs, Claude Code subagent configurations, and QA hooks.

Treating prompting as a metacategory is also a signal to teams: it is an engineering discipline, not a soft skill. It deserves the same investment as writing good tests or designing good APIs.

To Explore

The toolset never stands still. Here is what is on our radar for the coming quarter:

  • Linear — piloting as a lighter replacement for Jira, with a developer experience that better fits engineering-first teams.
  • Brave Search — evaluating as an alternative or complement to Perplexity for searching and learning, particularly for privacy-sensitive research contexts.
  • Playwright CLI with AI — exploring AI-assisted testing capabilities through Playwright's CLI interface.
  • GitHub Agentic Workflows (preview) — build GitHub Workflows with Markdown rather than YAML only. This is a promising direction to use CI in more ways.

Our Current Thoughts on Working with AI Tools

After another quarter of hands-on work, three things stand out as field notes from the trenches:

AI accelerates thinking — it does not replace it. AI is a force multiplier for engineering judgment, not a substitute. The more architectural maturity you bring to AI collaboration, the better your outcomes. Architects who lean on AI to skip the thinking tend to get mediocre results faster.

Prompting is an engineering discipline. Own the context. We consistently observe a significant difference between sessions that have too little context (under-informed, generic output) and sessions that have too much context (noisy, unfocused output). There is no silver bullet — but the habit of researching and documenting context close to the task before starting an AI session pays dividends every time.

AI amplifies both good and bad practices. If your processes are sloppy, AI will make them sloppier at scale. If your architectural thinking is sharp, AI will amplify it. This is perhaps the strongest argument for investing in learning and foundational understanding: in a landscape where implementation details can be delegated to AI, the competitive advantage lies in understanding the why behind those details — well enough to guide, review, and correct the AI's output.

Closing Principles

There is still no final or perfect AI toolset for software architects. Quarter by quarter, the practice comes down to three durable commitments:

  • Continuous evolution: adopt, test, and retire tools as needs change. One quarter's frontier tool may be the next quarter's baseline — OpenAI Codex being a fresh example from this edition.
  • Balanced coverage: ensure the toolset spans research, ideation, planning, coding, and quality. Gaps in any stage create bottlenecks that slow the whole team.
  • Architect mindset first: tools augment judgment; they do not replace it. The most valuable thing you bring to any AI collaboration is clear thinking about what you actually need and why.

The architects who thrive in this environment are not those who adopt the most tools — they are the ones who build the habit of thoughtful experimentation and honest evaluation, one quarter at a time.


The following is a terminology note for readers less familiar with how "agent" and "agentic" are used in this post and in the broader industry.

Agent or Agentic

"Agent" and "agentic" are among the most overloaded terms in the AI space — right alongside "artifact" or "service." They mean very different things depending on context, and the lack of consensus creates real confusion. Here is how the ambiguity plays out in practice:

  • "Agent" as a runtime orchestrator — tools like Claude Code that autonomously plan, execute, and iterate on multi-step tasks. This is the strongest sense of the word: an agent that reasons about what to do next and acts on it.
  • "Agent" as a scoped prompt package — what many teams call a "code review agent" or "testing agent" is often just a well-crafted prompt with specific instructions, not an autonomous system. In our toolset, we call these subagents to avoid the confusion.
  • "Agent" as a single instruction — in some contexts, even "write tests for this code" qualifies as an "agent," stretching the term to near-meaninglessness.
  • "Agentic" as a behavioral property — describes a workflow or tool that exhibits autonomy, tool use, and multi-step reasoning, as opposed to a single request-response exchange. A coding session where the AI reads files, runs tests, and iterates on failures is agentic; a one-shot code generation prompt is not.

In this post, when we say subagent we mean a Claude Code agent instance scoped to a focused task and orchestrated as part of a larger workflow. When we say agentic, we mean a workflow where the AI operates with autonomy across multiple steps. We avoid using "agent" on its own precisely because of the ambiguity above.


About the authors

Maciej Laskowski

Maciej Laskowski - software architect with deep hands-on experience. Continuous Delivery evangelist, architecture trade offs analyst, cloud-native solutions enthusiast.

Tomasz Michalak

Tomasz Michalak - a hands-on software architect interested in TDD and DDD who translates engineering complexity into the language of trade-offs and goals.

© 2026, Copyright ©HandsOnArchitects.com