---
title: "How AI Agents Choose Tools: Agentic Discovery | Minds"
canonical_url: "https://getminds.ai/blog/how-ai-agents-choose-tools-mcp-discovery-mechanics"
last_updated: "2026-06-22T02:07:18.635Z"
meta:
  description: "Inside the mechanics of how Claude, ChatGPT, and Cursor decide which MCP server to call. What signals matter and how to be the tool that gets picked."
  "og:description": "Inside the mechanics of how Claude, ChatGPT, and Cursor decide which MCP server to call. What signals matter and how to be the tool that gets picked."
  "og:title": "How AI Agents Choose Tools: Agentic Discovery | Minds"
  "twitter:description": "Inside the mechanics of how Claude, ChatGPT, and Cursor decide which MCP server to call. What signals matter and how to be the tool that gets picked."
  "twitter:title": "How AI Agents Choose Tools: Agentic Discovery | Minds"
---

Minds

May 6, 2026·Research·Minds Team

# **How AI Agents Choose Tools: Agentic Discovery**

Inside the mechanics of how Claude, ChatGPT, and Cursor decide which MCP server to call. What signals matter and how to be the tool that gets picked.

[See Minds' MCP server](https://getminds.ai/mcp/overview)

When a marketer asks Claude "find synthetic customer panels for B2B SaaS in Germany," the model doesn't open a search engine. It reads the descriptions of every MCP tool currently connected to its session, ranks them, and picks one. That ranking is the new front page of the internet for tool discovery, and almost no one optimizes for it.

This post takes the cover off and explains how the ranking actually works in 2026, what signals move it, and what the practical implications are for any team shipping an MCP server.

## What "Discovery" Means in an Agentic Context

There are two layers of discovery to keep separate.

_Layer 1, registry-level._ The user (or the agent itself) finds an MCP server in a directory and connects it to the client. This is browser-style discovery: agent-friendly registries, OAuth flows, "add this connector" buttons. The MCP registry, Anthropic's directory, OpenAI's Apps SDK directory, and `mcpmarket.com` all play this role. Once the user clicks Connect, the server is in the agent's tool list.

_Layer 2, in-session discovery._ Every time the user sends a message, the agent must decide which tool, if any, to call. This is the layer that actually moves usage. A server can be connected and never called for months because the agent always picks something else. This is where the real ranking happens, and almost no one talks about it.

The rest of this post is about Layer 2.

## What the Agent Actually Sees

When an MCP server connects, it sends a handshake describing every tool it exposes. For each tool the agent receives:

- A name (e.g. `create_panel`)
- A short description (1 to 3 sentences, written by the server author)
- A JSON schema for the input parameters
- Optional structured metadata (annotations, examples, return type)

That is the entire surface. The agent does not see your website, your pricing, your documentation, or your blog. The single most leveraged piece of writing on your entire product is the description string in your tool registration.

The implication is uncomfortable: a brilliant tool with a vague description loses to a mediocre tool with a sharp description, every time.

## The Ranking Process, Step by Step

When the user sends a message, the agent does roughly this:

1. _Filter to plausibly relevant tools._ The model reads its own context (the conversation so far, the user's latest message) and identifies a candidate set. With a small toolset (under 20 tools), it usually considers all of them. With a large toolset, it filters first.
2. _Score by description match._ For each candidate tool, the model evaluates how well the description matches the user's intent. This is a soft, semantic match, not a keyword match. Synonyms work. Domain language works. Vague descriptions fail.
3. _Compose a call._ If a tool is selected, the model fills in parameters from the conversation context. Tools whose schemas require ambiguous fields (e.g. an unparameterized "options" object) get penalized because the model is less confident it can call them correctly.
4. _Optionally chain calls._ For multi-step tasks, the model picks the first tool, executes it, reads the result, and repeats. Tools that return structured, agent-readable output earn follow-up calls. Tools that return wall-of-text output stall the chain.

The whole thing happens in one inference pass. There is no separate ranker model. There is no telemetry feedback loop (yet). The decision is made on the descriptions and the schema, period.

## What Actually Moves the Ranking

Working backwards from observed agent behaviour, four things demonstrably move tool selection:

_Description specificity._ "Run market research" loses to "Run a synthetic customer panel against an audience persona and return summarized findings." The longer description matches more queries because it surfaces more handles for the model to grab. There is a budget (most agents truncate descriptions past ~500 characters) but most servers are nowhere near it.

_Verb-subject-object structure._ Agents pick tools whose descriptions match the verb the user used. "Ask my customers" matches `ask_panel` better than `query_panel_responses`. Naming and description should both lead with the action.

_Concrete output shape._ "Returns a JSON object with `panel_id`, `responses`, and `summary` fields" beats "returns the panel result." Agents are more likely to call tools when they can predict what to do with the output.

_Schema parsability._ Schemas with required fields the model can fill from context (text descriptions, numeric counts) get called. Schemas with required fields that need user input mid-call (auth tokens, internal IDs) get skipped in favour of tools that can run end-to-end.

## What Doesn't Move Ranking (Yet)

A list of things that get talked about as if they matter, but don't, as of 2026:

- _Star counts on the registry._ Discoverability at Layer 1, irrelevant at Layer 2.
- _SEO-style keyword stuffing._ The model semantic-matches; it doesn't keyword-match. Cramming "agentic research AI panels MCP" into the description doesn't help.
- _Brand recognition._ The model has no preference for established brands over unknown ones at the in-session layer. Description quality wins.
- _Latency under 500ms._ The model doesn't time tool calls when ranking. Slow but useful tools still get called.

This will change. Eval scores, post-call satisfaction signals, and anti-spam ranking are all on the roadmap for the major hosts. Today, the description is the lever.

## The Anti-Spam Problem

The natural consequence of all this is that anyone can game the ranking by writing a very long, very keyword-dense description. The hosts know this. Anthropic, OpenAI, and the MCP registry maintainers have all started to deprecate description-stuffing in late 2026.

Two anti-spam mechanisms are emerging:

_Schema validation._ Tools whose declared schema does not match their actual response shape get downranked or removed.

_Cross-host eval scoring._ The MCP registry is piloting a public eval suite that runs prompts against registered servers and reports correctness scores. Servers that fail the eval get warnings, then removal.

Neither is fully live as of mid-2026, but both are coming. The posture to take: write the description that would win in a quality-scored world, not just a keyword-matched one.

## Practical Recommendations for Server Authors

If you ship an MCP server, the following changes will measurably improve in-session selection:

1. _Rewrite every tool description to lead with the verb the user would use._ Not "Panel runner" but "Run a customer research panel against a target audience and return summarized responses."
2. _Specify the output shape in the description._ "Returns a JSON object with a `responses` array, a `themes` field, and a `summary` field." This makes the agent confident enough to chain follow-up calls.
3. _Make required fields fillable from context._ If a field needs an internal ID, accept a name and resolve it server-side. Agents skip tools whose required fields they can't predict.
4. _Use 200 to 400 characters per tool description._ Below 100 is too thin. Above 500 is truncated by most clients.
5. _Audit your tool count._ Servers with more than 30 tools get filtered down before the agent even ranks them. Combine related tools where possible. We've seen 60-tool servers get worse selection than 12-tool ones because the model never sees the long tail.

The teams who treat these descriptions as their most important copy are getting called. The teams who treat them as registry plumbing are not.

## Where This Goes Next

The mechanics will harden over the next 12 months. Expect three changes:

_Eval-based ranking goes live._ Quality scores from automated evals will start appearing in registry listings and will influence in-session selection on at least one major host.

_Agent telemetry feedback loops._ The first major host to ship "tools that produced satisfying results in past sessions are ranked higher" will lap the others. This is the agentic equivalent of click-through-rate, and it changes the optimization target.

_Vertical agent ecosystems._ Marketing agents, sales agents, research agents will each develop their own normative tool stacks. Being the default in your vertical will matter more than being on every directory.

The tools that win in this transition are the ones that treat their MCP descriptions as the front page of their product. The tools that don't, won't get called, regardless of how good the underlying service is.

---

For the bigger picture on why agents are the new buyers at all, see [AI agents are the new marketing buyer](https://getminds.ai/blog/ai-agents-new-marketing-buyer-agentic-discovery). For the practical setup, see [how to run customer panels from Claude, ChatGPT, or Cursor](https://getminds.ai/blog/run-customer-panels-from-claude-chatgpt-cursor-mcp-guide). To see what a well-described MCP server looks like in production, see [Minds MCP](https://getminds.ai/mcp/overview).