<brwalder@microsoft.com><leo.lee@microsoft.com><annolan@microsoft.com><bokan@google.com><khushalsagar@google.com><hvanopstal@google.com>
-
-## TL;DR
-
-We propose a new JavaScript interface that allows web developers to expose their web application functionality as "tools" - JavaScript functions with natural language descriptions and structured schemas that can be invoked by AI agents, browser assistants, and assistive technologies. Web pages that use WebMCP can be thought of as [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control.
-
-For the technical details of the proposal, code examples, API shape, etc. see [proposal.md](./docs/proposal.md).
-
-## Terminology Used
-
-###### Agent
-An autonomous assistant that can understand a user's goals and take actions on the user's behalf to achieve them. Today,
-these are typically implemented by large language model (LLM) based AI platforms, interacting with users via text-based
-chat interfaces.
-
-###### Browser's Agent
-An autonomous assistant as described above but provided by or through the browser. This could be an agent built directly
-into the browser or hosted by it, for example, via an extension or plug-in.
-
-###### AI Platform
-Providers of agentic assistants such as OpenAI's ChatGPT, Anthropic's Claude, or Google's Gemini.
-
-###### Backend Integration
-A form of API integration between an AI platform and a third-party service in which the AI platform can talk directly to
-the service's backend servers without a UI or running code in the client. For example, the AI platform communicating with
-an MCP server provided by the service.
-
-###### Actuation
-An agent interacting with a web page by simulating user input such as clicking, scrolling, typing, etc.
-
-## Background and Motivation
-
-The web platform's ubiquity and popularity have made it the world's gateway to information and capabilities. Its ability to support complex, interactive applications beyond static content, has empowered developers to build rich user experiences and applications. These user experiences rely on visual layouts, mouse and touch interactions, and visual cues to communicate functionality and state.
-
-As AI agents become more prevalent, the potential for even greater user value is within reach. AI platforms such as Copilot, ChatGPT, Claude, and Gemini are increasingly able to interact with external services to perform actions such as checking local weather, finding flight and hotel information, and providing driving directions. These functions are provided by external services that extend the AI model’s capabilities. These extensions, or “tools”, can be used by an AI to provide domain-specific functionality that the AI cannot achieve on its own. Existing tools integrate with each AI platform via bespoke “integrations” - each service registers itself with the chosen platform(s) and the platform communicates with the service via an API (MCP, OpenAPI, etc). In this document, we call this style of tool a “backend integration”; users make use of the tools/services by chatting with an AI, the AI platform communicates with the service on the user's behalf.
-
-Much of the challenges faced by assistive technologies also apply to AI agents that struggle to navigate existing human-first interfaces when agent-first "tools" are not available. Even when agents succeed, simple operations often require multiple steps and can be slow or unreliable.
-
-The web needs web developer involvement to thrive. What if web developers could easily provide their site's capabilities to the agentic web to engage with their users? We propose WebMCP, a JavaScript API that allows developers to define tools for their webpage. These tools allow for code reuse with frontend code, maintain a single interface for users and agents, and simplify auth and state where users and agents are interacting in the same user interface. Such an API would also be a boon for accessibility tools, enabling them to offer users higher-level actions to perform on a page. This would mark a significant step forward in making the web more inclusive and actionable for everyone.
-
-AI agents can integrate in the backend via protocols like MCP in order to fulfill a user's task. For a web developer to expose their site's functionality this way, they need to write a server, usually in Python or NodeJS, instead of frontend JS which may be more familiar.
-
-There are several advantages to using the web to connect agents to services:
-
-* **Businesses near-universally already offer their services via the web.**
-
- WebMCP allows them to leverage their existing business logic and UI, providing a quick, simple, and incremental
- way to integrate with agents. They don't have to re-architect their product to fit the API shape of a given agent.
- This is especially true when the logic is already heavily client-side.
-
-
-* **Enables visually rich, cooperative interplay between a user, web page, and agent with shared context.**
-
- Users often start with a vague goal which is refined over time. Consider a user browsing for a high-value purchase.
- The user may prefer to start their journey on a specific page, ask their agent to perform some of the more tedious
- actions ("find me some options for a dress that's appropriate for a summer wedding, preferably red or orange, short
- or no sleeves and no embellishments"), and then take back over to browse among the agent-selected options.
-
-* **Allows authors to serve humans and agents from one source**
-
- The human-use web is not going away. Integrating agents into it prevents fragmentation of their service and allows
- them to keep ownership of their interface, branding and connection with their users.
-
-WebMCP is a proposal for a web API that enables web pages to provide agent-specific paths in their UI. With WebMCP, agent-service interaction takes place _via app-controlled UI_, providing a shared context available to app, agent, and user. In contrast to backend integrations, WebMCP tools are available to an agent only once it has loaded a page and they execute on the client. Page content and actuation remain available to the agent (and the user) but the agent also has access to tools which it can use to achieve its goal more directly.
-
-
-
-In contrast, in a backend integration, the agent-service interaction takes place directly, without an associated UI. If
-a UI is required it must be provided by the agent itself or somehow connected to an existing UI manually:
-
-
-
-## Goals
-
-- **Enable human-in-the-loop workflows**: Support cooperative scenarios where users work directly through delegating tasks to AI agents or assistive technologies while maintaining visibility and control over the web page(s).
-- **Simplify AI agent integration**: Enable AI agents to be more reliable and helpful by interacting with web sites through well-defined JavaScript tools instead of through UI actuation.
-- **Minimize developer burden**: Any task that a user can accomplish through a page's UI can be made into a tool by re-using much of the page's existing JavaScript code.
-- **Improve accessibility**: Provide a standardized way for assistive technologies to access web application functionality beyond what's available through traditional accessibility trees which are not widely implemented.
-
-## Non-Goals
-
-- **Headless browsing scenarios**: While it may be possible to use this API for headless or server-to-server interactions where no human is present to observe progress, this is not a current goal. Headless scenarios create many questions like the launching of browsers and profile considerations.
-- **Autonomous agent workflows**: The API is not intended for fully autonomous agents operating without human oversight, or where a browser UI is not required. This task is likely better suited to existing protocols like [A2A](https://a2aproject.github.io/A2A/latest/).
-- **Replacement of backend integrations**: WebMCP works with existing protocols like MCP and is not a replacement of existing protocols.
-- **Replace human interfaces**: The human web interface remains primary; agent tools augment rather than replace user interaction.
-- **Enable / influence discoverability of sites to agents**
-
-## Use Cases
-
-The use cases for WebMCP are ones in which the user is collaborating with the agent, rather than completely
-delegating their goal to it. They can also be helpful where interfaces are highly specific or complicated.
-
-### Example - Creative
-
-_Jen wants to create an invitation to her upcoming yard sale so she uses her browser to navigate to
-`http://easely.example`, her favorite graphic design platform. However, she's rather new to it and sometimes struggles
-to find all the functionality needed for her task in the app's extensive menus. She creates a "yard sale flyer" design
-and opens up a "templates" panel to look for a premade design she likes. There's so many templates and she's not sure
-which to choose from so she asks her browser agent for help._
-
-**Jen**: Show me templates that are spring themed and that prominently feature the date and time. They should be on a
-white background so I don't have to print in color.
-
-_The current document has registered a WebMCP tool that the agent notices may be relevant to this query:_
-
-```js
-/**
- * Filters the list of templates based on a description.
- *
- * description - A visual description of the types of templates to show, in natural language (English).
- */
- filterTemplates(description)
-```
-
-_The agent invokes the tool: `filterTemplate("spring themed, date and time displayed prominently, white background")`.
-The UI updates to show a filtered list matching this description._
-
-**Agent**: Ok, the remaining templates should now match your description.
-
-_Jen picks a template and gets to work._
-
-_The agent notices a new tool was registered when the design was loaded:_
-
-```js
-/**
- * Makes changes to the current design based on instructions. Possible actions include modifications to text
- * and font; insertion, deletion, transformation of images; placement and scale of elements. The instructions
- * should be limited a single task. Here are some examples:
-
- * editDesign("Change the title's font color to red");
- * editDesign("Rotate each picture in the background a bit to give the design a less symmetrical feel");
- * editDesign("Add a text field at the bottom of the design that reads 'example text'");
- *
- * instructions - A description of how the design should be changed, in natural language (English).
- */
- editDesign(instructions)
-```
-
-_With all the context of Jen's prompts, page state, and this editDesign tool, the agent is able to make helpful
-suggestions on next steps:_
-
-**Agent**: Would you like me to make the time/date font larger?
-
-**Jen**: Sure. Could you also swap out the clipart for something more yard-sale themed?
-
-**Agent**: Sure, let me do that for you.
-
-**Jen**: Please fill in the time and place using my home address. The time should be in my e-mail in a message from my
-husband.
-
-**Agent**: Ok, I've found it - I'll fill in the flyer with Aug 5-8, 2025 from 10am-3pm | 123 Queen Street West.
-
-_Jen is almost happy with the current design but think the heading could be better_
-
-**Jen**: Help me come up with a more attention grabbing headline for the call to action and title.
-
-**Agent**: Of course! Here are some more attention-grabbing headlines for your yard sale flyer, broken down by title and
-call to action:
-
-To Create Excitement:
- * Yard Sale Extravaganza!
- * The Ultimate Clear-Out Sale
- * Mega Garage & Yard Sale
-
-...
-
-**Jen**: Lets use "Yard Sale Extravaganza!" as the title. Create copies of this page with each of the call to action
-suggestions.
-
-_The agent takes this action using a sequence of tool calls which might look something like:_
-
-* `EditDesign("Change the title text to 'Yard Sale Extravaganza!'")`
-* `EditDesign("Change the call-to-action text to 'The hunt is on!'")`
-* `AddPage("DUPLICATE")`
-* `EditDesign("Change the call-to-action text to 'Ready, set, shop!'")`
-* `AddPage("DUPLICATE")`
-* `EditDesign("Change the call-to-action text to 'Come for the bargains, stay for the cookies'")`
-
-_Jen now has 3 versions of the same yard sale flyer. Easely implements these WebMCP tools using AI-based techniques on
-their backend to allow a natural language interface. Additionally, the UI presents these changes to Jen as an easily
-reversible batch of "uncommitted" changes, allowing her to easily review the agent's actions and make changes or undo as
-necessary. While the site could also implement a chat interface to expose this functionality with their own agent, the
-browser's agent provides a seamless journey by using tools across multiple sites/services. For example, pulling up
-information from the user's email service._
-
-**Agent**: Done! I've created three variations of the original design, each with a unique call to action.
-
-_Jen is now happy with these flyers. Normally she'd print to PDF and then take the file to a print shop. However, Easely
-has a new print service that Jen doesn't know about and doesn't notice in the UI. However, the agent knows the page has
-an `orderPrints` tool:_
-
-```js
-/**
- * Orders the current design for printing and shiping to the user.
- *
- * copies - A number between 0 and 1000 indicating how many copies of the design to print. Required.
- * page_size - The paper type to use. Available options are [Legal, Letter, A4, A5]. Default is "Letter".
- * page_finish - What kind of paper finish to use. Available options are [Regular, Glossy Photo, Matte Photo].
- * Default is "Regular"
- */
-orderPrints(copies, page_size, page_finish);
-```
-
-_The agent understands the user's intent and so surfaces a small chip in it's UI:_
-
-**Agent**: `Right now, when an AI agent (like me, or ChatGPT, or Gemini) wants to do something on a website -- book a flight, fill a form, check a price -- it has two bad options:
+ +Option A: Screen scraping. The agent looks at the website like a human would, tries to figure out where the buttons are, and clicks them. This is fragile, slow, and breaks whenever the website changes its layout. It is like trying to operate a machine by looking at a photo of the control panel.
+ +Option B: Backend API. The website builds a separate server-side MCP server that the agent connects to. This works well but requires backend engineering, server infrastructure, and maintenance. Many websites will never do this.
+ +WebMCP is Option C: The website itself tells the agent what it can do, directly in the browser. The website says: "Here are my tools -- you can search products, add to cart, check availability. Here is what each tool needs as input, and here is what it will give you back." The agent does not need to look at the screen. It just calls the tools.
+ +An autonomous assistant that understands goals and takes actions. Today, these are typically LLM-based: Claude, ChatGPT, Gemini. The agent is the one calling the tools that websites expose.
+ +An agent that lives inside the browser itself, rather than in a separate app. Google is building this into Chrome (think of it as an AI assistant built into your browser toolbar). This is different from an external agent like Claude Desktop connecting to the browser.
+ +The company providing the agent -- Anthropic, OpenAI, Google. The AI platform's agent connects to WebMCP tools.
+ +The person who builds the website. They are the ones who will use WebMCP to register tools on their site.
+ +The human sitting at the browser. WebMCP is designed for "user-present" interactions -- the human is there, watching, and can be asked for confirmation before the agent does something important.
+ +| What | +Who made it | +Where it runs | +What it does | +
|---|---|---|---|
| MCP (Model Context Protocol) |
+ Anthropic | +On a server (backend) | +The original protocol. Applications expose tools, resources, and prompts to AI models through a server that runs on the backend. Claude Desktop, OpenAI Agents SDK, and many others support it. | +
| WebMCP (Web Model Context Protocol) |
+ W3C Web Machine Learning Community Group (Google, Microsoft engineers leading) | +In the browser (frontend) | +Adapts MCP concepts for the web. Websites expose tools through JavaScript in the browser. No backend server needed. Uses the browser's own security model. Currently a draft specification. | +
| MCP-B (MCP for Browser) |
+ Community project (WebMCP-org on GitHub) | +Browser extension + JavaScript library | +A bridge. Since browsers do not natively support WebMCP yet, MCP-B provides a polyfill (temporary code that fills the gap) implementing the navigator.modelContext API, and translates between WebMCP format and the MCP wire protocol so existing MCP clients can talk to WebMCP-enabled sites. | +
The WebMCP API is surprisingly small. There are only a few pieces, and each one does something specific. Here they are:
+ +This is the entry point. navigator is a built-in browser object that gives access to browser features (like navigator.geolocation gives access to GPS). WebMCP adds modelContext to it. So navigator.modelContext is where all WebMCP functionality lives.
Every tool you register has these parts:
+ +readOnlyHint. If set to true, it tells the agent: "This tool only reads data -- it does not change anything." This helps agents decide which tools are safe to call without asking the user first.Here is what a complete tool registration looks like in code:
+ +client object representing the agent. This client object has one crucial method: requestUserInteraction().The web has a concept called "origin" -- it is the combination of protocol + domain + port. For example, https://amazon.com is one origin, and https://evil-site.com is a different origin. Browsers enforce strict rules about what one origin can access from another.
WebMCP inherits this model. A tool registered on amazon.com can only access amazon.com's data. An agent calling that tool operates within amazon.com's security boundary. A malicious site cannot register tools that access another site's data.
+ +The spec requires SecureContext, which means WebMCP only works on HTTPS pages (encrypted connections). It will not work on plain HTTP. This prevents eavesdropping on tool calls.
WebMCP is designed for situations where the user is present at the browser. This is different from server-side MCP, where agents might operate autonomously in the background. The user-present assumption is why requestUserInteraction() exists -- the spec expects a human to be available for confirmation.
A key question for the WebMCP specification:
+ +Currently, any website can register any number of tools the moment a user visits it. An AI agent connected to the browser can immediately discover and potentially call those tools. There is no step where the user sees: "This website wants to expose 12 tools to your AI agent. Allow?"
+ +Compare this to how other browser capabilities evolved:
+ +| Capability | +Permission model | +
|---|---|
| Camera / Microphone | +Browser shows a prompt: "This site wants to use your camera. Allow / Block" | +
| Location (GPS) | +Browser shows a prompt: "This site wants to know your location. Allow / Block" | +
| Notifications | +Browser shows a prompt: "This site wants to send you notifications. Allow / Block" | +
| WebMCP tools | +Currently: no prompt. Tools are silently registered and discoverable. | +
This does not mean WebMCP is dangerous right now. The requestUserInteraction() mechanism provides per-action consent. But it means an agent could discover tools without the user knowing, even if it needs permission to execute them.
Five open-source tools that work together as a quality pipeline for the MCP ecosystem:
+ +WebMCP is a Draft Community Group Report. In W3C terms, this means it is a proposal being discussed in a Community Group (the Web Machine Learning CG). It is not yet on the W3C Standards Track, and it is not a W3C Recommendation (the final stage of a web standard).
+ +The spec is early and open to change. This is exactly the right time to contribute -- before designs are locked in. The spec team is actively soliciting feedback.
+ +Typically: Community Group Report leads to a Working Group charter, which leads to a Working Draft, then Candidate Recommendation, then full W3C Recommendation. This process takes years. Chrome may implement experimental support (behind a flag) much sooner.
+ +The W3C community structure provides established channels for participation. Technical notes, tooling, and quality infrastructure are complementary contributions that help the specification succeed by addressing practical implementation concerns.
+ +execute function is a callback -- you define it, but the agent triggers it when it calls your tool.https://amazon.com:443 is one origin. Two different origins cannot access each other's data. This is the foundation of web security and the foundation of WebMCP security.interface, dictionary, readonly attribute -- that is WebIDL. It is the blueprint language for browser APIs.15 questions -
+