diff --git a/README(generator).md b/README(generator).md new file mode 100644 index 0000000..4cef924 --- /dev/null +++ b/README(generator).md @@ -0,0 +1,131 @@ +# WebMCP Model Card Generator + +Generate standardized documentation for WebMCP browser-side tools. + +Extends [MCP Model Card Specification v1.0](https://github.com/Starborn/MCP-Model-Card-Generator) for the browser ecosystem. + +## Use It Now + +**Open `index.html` in any browser.** That's it. No installation, no build step, no dependencies to manage. + +If hosted on GitHub Pages: [Open the Generator](https://starborn.github.io/WebMCP-Model-Card-Generator/) + +You can also paste the `.jsx` file into a Claude artifact to run it inside Claude. + +## What It Does + +You fill in a 12-tab form. The generator produces two files: + +- **JSON** -- machine-readable model card for registries, agents, and validators +- **Markdown** -- human-readable documentation for GitHub, specs, and READMEs + +Each tab has: +- **Micro-guidance** right above every field telling you exactly what to type +- **Tutorial sidebar** (toggle on/off) explaining MCP vs WebMCP differences +- **Smart defaults** so you can generate a valid card without filling everything + +## What is a WebMCP Model Card? + +A model card is like a nutrition label for a tool. It tells AI agents and developers what your tool does, what it needs, what can go wrong, and how to use it safely. + +[WebMCP](https://webmachinelearning.github.io/webmcp/) is a W3C draft specification (Feb 2026) that lets web pages expose JavaScript functions as structured tools for AI agents. Instead of agents scraping your page, you register tools they can call directly. + +This generator documents those tools in a standardized format, extending the MCP Model Card Specification v1.0 that covers backend MCP servers. + +## The 12 Sections + +1. **Identity & Provenance** -- name, version, author, attribution (human/AI co-created/AI-generated) +2. **API Mode** -- declarative (HTML forms) vs imperative (JavaScript) vs both +3. **Tool Documentation** -- each tool's name, description, input schema, error states +4. **Session & Auth** -- authenticated vs anonymous, browser session inheritance +5. **Browser Compatibility** -- Chrome version, MCP-B polyfill, graceful degradation +6. **Security** -- same-origin scope, CSP, exfiltration risk, lethal trifecta assessment +7. **Interaction Patterns** -- statefulness, tool dependencies, page state modification +8. **Context Requirements** -- provideContext() data, DOM/framework state access +9. **Limitations** -- what the tool cannot do, edge cases, failure modes +10. **Discovery** -- registry listing, .well-known/webmcp, pre-visit discoverability +11. **Testing** -- methodology, which agents tested, pass/fail criteria +12. **Backend MCP Relationship** -- does a corresponding backend server exist? + +## How This Was Built + +### Architecture + +The generator is a single-page React application compiled at runtime in the browser. No build tools, no npm, no webpack, no server. + +### Technical Stack + +- **React 18** -- loaded from CDN (cdnjs.cloudflare.com), provides the component model and state management +- **Babel Standalone** -- loaded from CDN, compiles JSX (React's HTML-like syntax) to JavaScript directly in the browser at page load +- **Google Fonts** -- DM Sans (body text), DM Mono (code/schemas), Outfit (headings) +- **No other dependencies** -- no CSS framework, no UI library, no build pipeline + +### How It Works + +1. Browser loads `index.html` +2. React and Babel load from CDN (two small JavaScript files, cached after first visit) +3. Babel compiles the inline JSX code to plain JavaScript +4. React renders the 12-tab form interface +5. All state lives in React's `useState` hooks -- nothing is sent to any server +6. When you click "Generate", two pure functions transform the form state into JSON and Markdown strings +7. You copy the output. That's it. + +### Why This Approach? + +- **Zero installation** -- open the file, it works +- **No server needed** -- everything runs client-side +- **GitHub Pages ready** -- push one file, enable Pages, done +- **No data leaves your browser** -- the generator is entirely local +- **Portable** -- download the HTML file, use it offline + +### File Structure + +``` +index.html -- The complete generator (one file, ~960 lines) +WebMCP_Model_Card_Generator_USER_GUIDE.md -- Field-by-field reference guide +webmcp-model-card-generator-v2.jsx -- React source (for running inside Claude artifacts) +README.md -- This file +``` + +### Development Process + +This generator was built in a single session through human-AI co-creation: + +1. Analyzed the [W3C WebMCP specification](https://webmachinelearning.github.io/webmcp/) (Draft, 12 Feb 2026) +2. Catalogued existing WebMCP implementations (Google travel demo, MCP-B examples, Chrome DevTools quickstart, Jason McGhee's original) +3. Mapped the existing [MCP Model Card Specification v1.0](https://github.com/Starborn/MCP-Model-Card-Generator) schema to browser-side concepts +4. Identified 12 documentation sections covering identity, API mode, tools, session, browser compatibility, security, interaction patterns, context requirements, limitations, discovery, testing, and backend relationship +5. Built the React form interface with contextual tutorial sidebar +6. Added micro-guidance to every field based on usability testing +7. Packaged as standalone HTML for GitHub Pages deployment + +## MCP vs WebMCP -- Quick Reference + +| Aspect | Anthropic MCP (Backend) | W3C WebMCP (Browser) | +|--------|------------------------|---------------------| +| Runs on | Server (Node.js, Python, Docker) | Web page (browser JavaScript) | +| Transport | stdio, HTTP+SSE, JSON-RPC | postMessage, navigator.modelContext | +| Auth | API keys, env vars, OAuth config | Inherits browser session automatically | +| Discovery | MCP Registry, npm, .well-known | Page must be visited first (gap) | +| Tool registration | Server-side code only | HTML forms OR JavaScript (or both) | +| Human oversight | Separate approval flow | Human sees the page -- IS the oversight | +| Security boundary | API key management | Same-origin policy + CSP + human-in-loop | + +## Related Projects + +- [MCP Model Card Generator](https://github.com/Starborn/MCP-Model-Card-Generator) -- the original generator for backend MCP servers +- [MCP Model Card Specification v1.0](https://github.com/Starborn/MCP-Model-Card-Generator/blob/main/MCP_Model_Card_Specification_v1_0.md) -- the specification this extends +- [W3C WebMCP Spec](https://webmachinelearning.github.io/webmcp/) -- the W3C draft specification +- [WebMCP GitHub](https://github.com/webmachinelearning/webmcp) -- the official W3C repository + +## Credits + +**Co-created by:** Paola (concept, design) and Claude (Anthropic) + +**W3C Groups:** +- [AI Knowledge Representation Community Group](https://www.w3.org/community/aikr/) +- [Web Machine Learning Community Group](https://www.w3.org/community/webmachinelearning/) + +**License:** MIT + +Released February 2026 diff --git a/README.md b/README.md index bccf9b2..9a1c5cc 100644 --- a/README.md +++ b/README.md @@ -1,618 +1,8 @@ -# WebMCP đź§Ş +https://starborn.github.io/webmcp/ (Model Card Generator) -_Enabling web apps to provide JavaScript-based tools that can be accessed by AI agents and assistive technologies to create collaborative, human-in-the-loop workflows._ +https://starborn.github.io/webmcp/webmcp-complete-guide.html (Guide) -> First published August 13, 2025 -> -> Brandon Walderman <brwalder@microsoft.com>
-> Leo Lee <leo.lee@microsoft.com>
-> Andrew Nolan <annolan@microsoft.com>
-> David Bokan <bokan@google.com>
-> Khushal Sagar <khushalsagar@google.com>
-> Hannah Van Opstal <hvanopstal@google.com> - -## TL;DR - -We propose a new JavaScript interface that allows web developers to expose their web application functionality as "tools" - JavaScript functions with natural language descriptions and structured schemas that can be invoked by AI agents, browser assistants, and assistive technologies. Web pages that use WebMCP can be thought of as [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) servers that implement tools in client-side script instead of on the backend. WebMCP enables collaborative workflows where users and agents work together within the same web interface, leveraging existing application logic while maintaining shared context and user control. - -For the technical details of the proposal, code examples, API shape, etc. see [proposal.md](./docs/proposal.md). - -## Terminology Used - -###### Agent -An autonomous assistant that can understand a user's goals and take actions on the user's behalf to achieve them. Today, -these are typically implemented by large language model (LLM) based AI platforms, interacting with users via text-based -chat interfaces. - -###### Browser's Agent -An autonomous assistant as described above but provided by or through the browser. This could be an agent built directly -into the browser or hosted by it, for example, via an extension or plug-in. - -###### AI Platform -Providers of agentic assistants such as OpenAI's ChatGPT, Anthropic's Claude, or Google's Gemini. - -###### Backend Integration -A form of API integration between an AI platform and a third-party service in which the AI platform can talk directly to -the service's backend servers without a UI or running code in the client. For example, the AI platform communicating with -an MCP server provided by the service. - -###### Actuation -An agent interacting with a web page by simulating user input such as clicking, scrolling, typing, etc. - -## Background and Motivation - -The web platform's ubiquity and popularity have made it the world's gateway to information and capabilities. Its ability to support complex, interactive applications beyond static content, has empowered developers to build rich user experiences and applications. These user experiences rely on visual layouts, mouse and touch interactions, and visual cues to communicate functionality and state. - -As AI agents become more prevalent, the potential for even greater user value is within reach. AI platforms such as Copilot, ChatGPT, Claude, and Gemini are increasingly able to interact with external services to perform actions such as checking local weather, finding flight and hotel information, and providing driving directions. These functions are provided by external services that extend the AI model’s capabilities. These extensions, or “tools”, can be used by an AI to provide domain-specific functionality that the AI cannot achieve on its own. Existing tools integrate with each AI platform via bespoke “integrations” - each service registers itself with the chosen platform(s) and the platform communicates with the service via an API (MCP, OpenAPI, etc). In this document, we call this style of tool a “backend integration”; users make use of the tools/services by chatting with an AI, the AI platform communicates with the service on the user's behalf. - -Much of the challenges faced by assistive technologies also apply to AI agents that struggle to navigate existing human-first interfaces when agent-first "tools" are not available. Even when agents succeed, simple operations often require multiple steps and can be slow or unreliable. - -The web needs web developer involvement to thrive. What if web developers could easily provide their site's capabilities to the agentic web to engage with their users? We propose WebMCP, a JavaScript API that allows developers to define tools for their webpage. These tools allow for code reuse with frontend code, maintain a single interface for users and agents, and simplify auth and state where users and agents are interacting in the same user interface. Such an API would also be a boon for accessibility tools, enabling them to offer users higher-level actions to perform on a page. This would mark a significant step forward in making the web more inclusive and actionable for everyone. - -AI agents can integrate in the backend via protocols like MCP in order to fulfill a user's task. For a web developer to expose their site's functionality this way, they need to write a server, usually in Python or NodeJS, instead of frontend JS which may be more familiar. - -There are several advantages to using the web to connect agents to services: - -* **Businesses near-universally already offer their services via the web.** - - WebMCP allows them to leverage their existing business logic and UI, providing a quick, simple, and incremental - way to integrate with agents. They don't have to re-architect their product to fit the API shape of a given agent. - This is especially true when the logic is already heavily client-side. - - -* **Enables visually rich, cooperative interplay between a user, web page, and agent with shared context.** - - Users often start with a vague goal which is refined over time. Consider a user browsing for a high-value purchase. - The user may prefer to start their journey on a specific page, ask their agent to perform some of the more tedious - actions ("find me some options for a dress that's appropriate for a summer wedding, preferably red or orange, short - or no sleeves and no embellishments"), and then take back over to browse among the agent-selected options. - -* **Allows authors to serve humans and agents from one source** - - The human-use web is not going away. Integrating agents into it prevents fragmentation of their service and allows - them to keep ownership of their interface, branding and connection with their users. - -WebMCP is a proposal for a web API that enables web pages to provide agent-specific paths in their UI. With WebMCP, agent-service interaction takes place _via app-controlled UI_, providing a shared context available to app, agent, and user. In contrast to backend integrations, WebMCP tools are available to an agent only once it has loaded a page and they execute on the client. Page content and actuation remain available to the agent (and the user) but the agent also has access to tools which it can use to achieve its goal more directly. - -![A diagram showing an agent communicating with a third-party service via WebMCP running in a live web page](./content/explainer_webmcp.png) - -In contrast, in a backend integration, the agent-service interaction takes place directly, without an associated UI. If -a UI is required it must be provided by the agent itself or somehow connected to an existing UI manually: - -![A diagram showing an agent communicating with a third-party service directl via MCP](./content/explainer_mcp.png) - -## Goals - -- **Enable human-in-the-loop workflows**: Support cooperative scenarios where users work directly through delegating tasks to AI agents or assistive technologies while maintaining visibility and control over the web page(s). -- **Simplify AI agent integration**: Enable AI agents to be more reliable and helpful by interacting with web sites through well-defined JavaScript tools instead of through UI actuation. -- **Minimize developer burden**: Any task that a user can accomplish through a page's UI can be made into a tool by re-using much of the page's existing JavaScript code. -- **Improve accessibility**: Provide a standardized way for assistive technologies to access web application functionality beyond what's available through traditional accessibility trees which are not widely implemented. - -## Non-Goals - -- **Headless browsing scenarios**: While it may be possible to use this API for headless or server-to-server interactions where no human is present to observe progress, this is not a current goal. Headless scenarios create many questions like the launching of browsers and profile considerations. -- **Autonomous agent workflows**: The API is not intended for fully autonomous agents operating without human oversight, or where a browser UI is not required. This task is likely better suited to existing protocols like [A2A](https://a2aproject.github.io/A2A/latest/). -- **Replacement of backend integrations**: WebMCP works with existing protocols like MCP and is not a replacement of existing protocols. -- **Replace human interfaces**: The human web interface remains primary; agent tools augment rather than replace user interaction. -- **Enable / influence discoverability of sites to agents** - -## Use Cases - -The use cases for WebMCP are ones in which the user is collaborating with the agent, rather than completely -delegating their goal to it. They can also be helpful where interfaces are highly specific or complicated. - -### Example - Creative - -_Jen wants to create an invitation to her upcoming yard sale so she uses her browser to navigate to -`http://easely.example`, her favorite graphic design platform. However, she's rather new to it and sometimes struggles -to find all the functionality needed for her task in the app's extensive menus. She creates a "yard sale flyer" design -and opens up a "templates" panel to look for a premade design she likes. There's so many templates and she's not sure -which to choose from so she asks her browser agent for help._ - -**Jen**: Show me templates that are spring themed and that prominently feature the date and time. They should be on a -white background so I don't have to print in color. - -_The current document has registered a WebMCP tool that the agent notices may be relevant to this query:_ - -```js -/** - * Filters the list of templates based on a description. - * - * description - A visual description of the types of templates to show, in natural language (English). - */ - filterTemplates(description) -``` - -_The agent invokes the tool: `filterTemplate("spring themed, date and time displayed prominently, white background")`. -The UI updates to show a filtered list matching this description._ - -**Agent**: Ok, the remaining templates should now match your description. - -_Jen picks a template and gets to work._ - -_The agent notices a new tool was registered when the design was loaded:_ - -```js -/** - * Makes changes to the current design based on instructions. Possible actions include modifications to text - * and font; insertion, deletion, transformation of images; placement and scale of elements. The instructions - * should be limited a single task. Here are some examples: - - * editDesign("Change the title's font color to red"); - * editDesign("Rotate each picture in the background a bit to give the design a less symmetrical feel"); - * editDesign("Add a text field at the bottom of the design that reads 'example text'"); - * - * instructions - A description of how the design should be changed, in natural language (English). - */ - editDesign(instructions) -``` - -_With all the context of Jen's prompts, page state, and this editDesign tool, the agent is able to make helpful -suggestions on next steps:_ - -**Agent**: Would you like me to make the time/date font larger? - -**Jen**: Sure. Could you also swap out the clipart for something more yard-sale themed? - -**Agent**: Sure, let me do that for you. - -**Jen**: Please fill in the time and place using my home address. The time should be in my e-mail in a message from my -husband. - -**Agent**: Ok, I've found it - I'll fill in the flyer with Aug 5-8, 2025 from 10am-3pm | 123 Queen Street West. - -_Jen is almost happy with the current design but think the heading could be better_ - -**Jen**: Help me come up with a more attention grabbing headline for the call to action and title. - -**Agent**: Of course! Here are some more attention-grabbing headlines for your yard sale flyer, broken down by title and -call to action: - -To Create Excitement: - * Yard Sale Extravaganza! - * The Ultimate Clear-Out Sale - * Mega Garage & Yard Sale - -... - -**Jen**: Lets use "Yard Sale Extravaganza!" as the title. Create copies of this page with each of the call to action -suggestions. - -_The agent takes this action using a sequence of tool calls which might look something like:_ - -* `EditDesign("Change the title text to 'Yard Sale Extravaganza!'")` -* `EditDesign("Change the call-to-action text to 'The hunt is on!'")` -* `AddPage("DUPLICATE")` -* `EditDesign("Change the call-to-action text to 'Ready, set, shop!'")` -* `AddPage("DUPLICATE")` -* `EditDesign("Change the call-to-action text to 'Come for the bargains, stay for the cookies'")` - -_Jen now has 3 versions of the same yard sale flyer. Easely implements these WebMCP tools using AI-based techniques on -their backend to allow a natural language interface. Additionally, the UI presents these changes to Jen as an easily -reversible batch of "uncommitted" changes, allowing her to easily review the agent's actions and make changes or undo as -necessary. While the site could also implement a chat interface to expose this functionality with their own agent, the -browser's agent provides a seamless journey by using tools across multiple sites/services. For example, pulling up -information from the user's email service._ - -**Agent**: Done! I've created three variations of the original design, each with a unique call to action. - -_Jen is now happy with these flyers. Normally she'd print to PDF and then take the file to a print shop. However, Easely -has a new print service that Jen doesn't know about and doesn't notice in the UI. However, the agent knows the page has -an `orderPrints` tool:_ - -```js -/** - * Orders the current design for printing and shiping to the user. - * - * copies - A number between 0 and 1000 indicating how many copies of the design to print. Required. - * page_size - The paper type to use. Available options are [Legal, Letter, A4, A5]. Default is "Letter". - * page_finish - What kind of paper finish to use. Available options are [Regular, Glossy Photo, Matte Photo]. - * Default is "Regular" - */ -orderPrints(copies, page_size, page_finish); -``` - -_The agent understands the user's intent and so surfaces a small chip in it's UI:_ - -**Agent**: `` - -_Jen is delighted she saved a trip to the store and clicks the button_. - -**Agent**: How many copies would you like? I'll request 8.5x11 sized regular paper but there are other options available. - -**Jen**: Please print 10 copies. - -**Agent**: Done! The order is ready for your review. - -_The site navigates to the checkout page where Jen can review the order and click the checkout button to confirm._ - - -### Example - Shopping - -Shopping often includes exploration, browsing, and reviewing. The agent can help and remove the tedious parts but the -user wants to be involved. - -_Maya has a few favorite clothing vendors but doesn't get formal clothes very often so she opens her AI assistant_ - -**Maya**: Suggest a few formal or semi-formal dress places where I can shop for my friend’s wedding. Please make sure -they are ecofriendly and not too expensive. - -**Agent**: Sure, here's a few suggestions - -1. Elara - uses innovative recycled and plant-based fabrics to minimize waste for a modern, chic wardrobe that allows - you to embrace the latest trends, responsibly. Price Range: Dresses typically \$90 - \$220 CAD - -2. Linden Harbour - creates wonderfully crafted womenswear designed to last a lifetime. Carries everything from classic -shirt dresses to elegant knitwear, meticulously made from the finest organic cotton, linen, and responsibly sourced -wool. Price Range: Dresses typically \$250 - \$550 CAD - -3. Wildebloom - Flowing dresses brought to life through artisanal collaborations, using natural plant dyes and panels of - upcycled vintage textiles. Price Range: Dresses typically \$180 - \$450 CAD - -**Maya**: Lets take a look at Wildebloom. - -_The agent now opens Maya's web browser, which it is integrated with. i.e. the agent can observe and control the -browser. It navigates to the dresses page on `http://wildebloom.example/shop`_ - -**Agent**: Ok, here are the dresses that Wildebloom carries. - -_Maya is immediately overwhelmed. There are so many options! Moreover, when she looks at filters she sees they're -quite limited with only colour and size as options._ - -**Maya**: Show me only dresses available in my size, and also show only the ones that would be appropriate for a -cocktail-attire wedding. - -_The agent notices the dresses page registers several tools:_ - -```js -/* - * Returns an array of product listings containing an id, detailed description, price, and photo of each - * product - * - * size - optional - a number between 2 and 14 to filter the results by EU dress size - * size - optional - a color from [Red, Blue, Green, Yellow, Black, White] to filter dresses by - */ -getDresses(size, color) - -/* - * Displays the given products to the user - * - * product_ids - An array of numbers each of which is a product id returned from getDresses - */ -showDresses(product_ids) -``` - -_The agent calls `getDresses(6)` and receives a JSON object:_ - -```json -{ - "products": [ - { - "id": 1021, - "description": "A short sleeve long dress with full length button placket...", - "price": "€180", - "image": "img_1024.png" - }, - { - "id": 4320, - "description": "A straight midi dress in organic cotton...", - "price": "€140", - "image": "img_4320.png" - }, - ... - ] -} -``` - -> [!Note] -> How to pass images and other non-textual data is something we should improve (See [Issue #41](https://github.com/webmachinelearning/webmcp/issues/41)) - -_The agent can now process this list, fetching each image, and using the user's criteria to filter the list. When -completed it makes another call, this time to `showDresses([4320, 8492, 5532, ...])`. This call updates the UI on the -page to show only the requested dresses._ - -_This is still too many dresses so Maya finds an old photo of herself in a summer dress that she really likes and shares -it with her agent._ - -**Maya**: Are there any dresses similar to the dress worn in this photo? Try to match the colour and style, but continue -to show me dresses appropriate for cocktail-attire. - -_The agent uses this image to identify several new parameters including: the colour, the fit, and the neckline and -narrows down the list to just a few dresses. Maya finds and clicks on a dress she likes._ - -_Notice, the user did not give their size, but the agent knows this from personalization and may even translate the stored -size into EU units to use it with this site._ - -### Example - Code Review - -Some services are very domain specific and/or provide a lot of functionality. A real world example is the Chromium code -review tool: Gerrit. See [CL#5142508](https://crrev.com/c/5142508). Gerrit has many features but they're not obvious just by -looking at the UI (you can press the '?' key to show a shortcut guide). In order to add a comment to a line, the user -must know to press the 'c' key. The user can suggest edits but has to open a comment to do so. Results from test runs -are available but are hidden in a generically-named "Checks" tab. - -Agents are typically trained on everyday usage so may do a poor job on more specialized, complex interfaces. However, -such sites could provide the agent with tools which serve as both a shortcut and a user manual for the agent. - -_John is a software developer and opens a code review sent from his colleague. He notices there's two red bots -indicating test failures on this patch._ - -**John**: Why are the Mac and Android bots failing? - -_The site includes some relevant tools:_ - -```js -/** - * Returns the status of each bot run in a try run job. Use the returned bot_name in getTryRunFailureSnippet - * to get more details about a run. - */ -getTryRunStatuses(); - -/** - * If a bot failed, returns the TAIL snippet of the log containing the error message describing the failure. - * - * bot_name - The name of the bot, as returned from getTryRunStatuses, to get failure snippet from - */ -getTryRunFailureSnippet(bot_name) -``` - -_The agent calls `getTryRunStatuses()` to find the names of the failing bots. It returns:_ - -```json -[ - { - "bot_name": "mac-x86-rel", - "status": "PASSED" - }, - { - "bot_name": "mac-x64-rel", - "status": "FAIL" - }, - { - "bot_name": "android-15-rel", - "status": "PASSED" - }, - ... -] -``` - -**Agent**: I see that the Mac x86 Android 10 bots are failing. Let me get more details... - -_The agent now calls `getTryRunFailureSnippet` on each of the failing bots and receives a snippet of the failing log of -each._ - -**Agent**: The Mac bot is failing to start with an "Out of Space" error. This is probaby just an ifrastructure issue. - -The Android bot is failing while linking with a missing symbol `gfx::DisplayCompositor`. - -_John realizes his colleague forgot to add the Android implementation of this class to the build file._ - -**John**: I think BUILD.gn is missing `display_compositor_android.cc`, please add a suggested edit adding it in the -Android section. - -_The agent has access to BUILD.gn from the main UI as well as in DOM. It notices it also has access to a relevant tool -which is uses to add the change:_ - -```js -/* - * Adds a suggested edit to the review - * - * filename - the name of the file in which to make the edit - * patch - the edit diff in unidiff format - */ -addSuggestedEdit(filename, patch) -``` - -**Agent**: Ok, I've added `display_compositor_android.cc` to the BUILD.gn file. - -_The UI displays the suggested diff with an option for the user to accept, modify, or reject the change. John accepts -the change._ - -_Reading the rest of the review, John notices a small issue repeated across multiple files._ - -**John**: Add a polite comment to the review that we should use "PointF" rather than "Point" for input coordinates since -the latter can cause unintended rounding. Then add suggested edits changing all instances where Point was added to -PointF. - -_The agent automates the repetitive task of making all the simple changes. The UI provides John with a visual way to -quickly review the agent's actions and accept/modify/reject them._ - -## Assumptions - -* For many sites wanting to integrate with agents quickly - augmenting their existing UI with WebMCP tools will be - easier vs. backend integration -* Agents will perform quicker and more successfully with specific tools compared to using a human interface. -* Users might use an agent for a direct action query (e.g. “create a 30 minute meeting with Pat at 3:00pm”), complex - cross-site queries (e.g. “Find the 5 highest rated restaurants in Toronto, pin them in my Map, and book a table at - each one over the next 5 weeks”) and everything in between. - -## Prior Art - -### Model Context Protocol (MCP) - -MCP is a protocol for applications to interface with an AI model. Developed by Anthropic, MCP is supported by Claude -Desktop and Open AI's Agents SDK as well as a growing ecosystem of clients and servers. - -In MCP, an application can expose tools, resources, and more to an AI-enabled application by implementing an MCP server. -The server can be implemented in various languages, as long as it conforms to the protocol. For example, here’s an -implementation of a tool using the Python SDK from the MCP quickstart guide: - -```python -@mcp.tool() -async def get_alerts(state: str) -> str: - """Get weather alerts for a US state. - - Args: - state: Two-letter US state code (e.g. CA, NY) - """ - url = f"{NWS_API_BASE}/alerts/active/area/{state}" - data = await make_nws_request(url) - - if not data or "features" not in data: - return "Unable to fetch alerts or no alerts found." - - if not data["features"]: - return "No active alerts for this state." - - alerts = [format_alert(feature) for feature in data["features"]] - return "\n---\n".join(alerts) -``` - -A client application implements a matching MCP client which takes a user’s query, communicates with one or more MCP -servers to enumerate their capabilities, and constructs a prompt to the AI platform, passing along any server-provided -tools or data. - -The MCP protocol defines how this client-server communication happens. For example, a client can ask the server to list -all tools which might return a response like this: - -```json -{ - "jsonrpc": "2.0", - "id": 1, - "result": { - "tools": [ - { - "name": "get_weather", - "description": "Get current weather information for a location", - "inputSchema": { - "type": "object", - "properties": { - "location": { - "type": "string", - "description": "City name or zip code" - } - }, - "required": ["location"] - } - } - ], - "nextCursor": "next-page-cursor" - } -} -``` - -Unlike OpenAPI, MCP is transport-agnostic. It comes with two built in transports: stdio which uses the systems standard -input/output, well suited for local communication between apps, and Server-Sent Events (SSE) which uses HTTP commands -for remote execution. - -### WebMCP (MCP-B) - -[MCP-B](https://mcp-b.ai/), or Model Context Protocol for the Browser, is an open source project found on GitHub [here](https://github.com/MiguelsPizza/WebMCP) and has much the same motivation and solution as described in this proposal. MCP-B's underlying protocol, also named WebMCP, extends MCP with tab transports that allow in-page communication between a website's MCP server and any client in the same tab. It also extends MCP with extension transports that use Chromium's runtime messaging to make a website's MCP server available to other extension components within the browser (background, sidebar, popup), and to other external MCP clients running on the same machine. MCP-B enables tools from different sites to work together, and for sites to cache tools so that they are discoverable even if the browser isn't currently navigated to the site. - -### OpenAPI - -OpenAPI is a standard for describing HTTP based APIs. Here’s an example in YAML (from the ChatGPT Actions guide): - -```yaml -openapi: 3.1.0 -info: - title: NWS Weather API - description: Access to weather data including forecasts, alerts, and observations. - version: 1.0.0 -servers: - - url: https://api.weather.gov - description: Main API Server -paths: - /points/{latitude},{longitude}: - get: - operationId: getPointData - summary: Get forecast grid endpoints for a specific location - parameters: - - name: latitude - in: path - required: true - schema: - type: number - format: float - description: Latitude of the point - - name: longitude - in: path - required: true - schema: - type: number - format: float - description: Longitude of the point - responses: - '200': - description: Successfully retrieved grid endpoints - content: - application/json: - schema: - type: object - properties: - properties: - type: object - properties: - forecast: - type: string - format: uri - forecastHourly: - type: string - format: uri - forecastGridData: - type: string - format: uri -``` - -A subset of the OpenAPI specification is used for function-calling / tool use for various AI platforms, such as ChatGPT -Actions and Gemini Function Calling. A user or developer on the AI platform would provide the platform with the OpenAPI -schema for an API they wish to provide as a “tool”. The AI is trained to understand this schema and is able to select -the tool and output a “call” to it, providing the correct arguments. Typically, some code external to the AI itself -would be responsible for making the API call and passing the returned result back to the AI’s conversation context to -reply to the user’s query. - -### Agent2Agent Protocol - -The Agent2Agent Protocol is another protocol for communication between agents. While similar in structure to MCP (client -/ server concepts that communicate via JSON-RPC), A2A attempts to solve a different problem. MCP (and OpenAPI) are -generally about exposing traditional capabilities to AI models (i.e. “tools”), A2A is a protocol for connecting AI -agents to each other. It provides some additional features to make common tasks in this scenario more streamlined, such -as: capability advertisement, long running and multi-turn interactions, and multimodal input/output. - -## Open topics - -### Security considerations - -There are security considerations that will need to be accounted for, especially if the WebMCP API is used by semi-autonomous systems like LLM-based agents. Engagement from the community is welcome. - -### Model poisoning - -Explorations should be made on the potential implications of allowing web developers to create tools in their front-end code for use in AI agents and LLMs. For example, vulnerabilities like being able to access content the user would not typically be able to see will need to be investigated. - -### Cross-Origin Isolation - -Client applications would have access to many different web sites that expose tools. Consider an LLM-based agent. It is possible and even likely that data output from one application's tools could find its way into the input parameters for a second application's tool. There are legitimate reasons for the user to want to send data across origins to achieve complex tasks. Care should be taken to indicate to the user which web applications are being invoked and with what data so that the user can intervene. - -### Permissions - -A trust boundary is crossed both when a web site first registers tools via WebMCP, and when a new client agent wants to use these tools. When a web site registers tools, it exposes information about itself and the services it provides to the host environment (i.e. the browser). When agents send tool calls, the site receives untrusted input in the parameters and the outputs in turn may contain sensitive user information. The browser should prompt the user at both points to grant permission and also provide a means to see what information is being sent to and from the site when a tool is called. To streamline workflows, browsers may give users the choice to always allow tool calls for a specific web app and client app pair. - -### Model Context Protocol (MCP) without WebMCP - -MCP has quickly garnered wide interest from the developer community, with hundreds of MCP servers being created. WebMCP is designed to work well with MCP, so that developers can reuse many of the MCP topics with their front-end website using JavaScript. We originally planned to propose an explainer very tightly aligned with MCP, providing all the same concepts supported by MCP at the time of writing, including tools, resources, and prompts. Since MCP is still actively changing, matching its exact capabilities would be an ongoing effort. Aligning the WebMCP API tightly with MCP would also make it more difficult to tailor WebMCP for non-LLM scenarios like OS and accessibility assistant integrations. Keeping the WebMCP API as agnostic as possible increases the chance of it being useful to a broader range of potential clients. - -We expect some web developers will continue to prefer standalone MCP instead of WebMCP if they want to have an always-on MCP server running that does not require page navigation in a full browser process. For example, server-to-server scenarios such as fully autonomous agents will likely benefit more from MCP servers. WebMCP is best suited for local browser workflows with a human in the loop. - -The WebMCP API still maps nicely to MCP, and exposing WebMCP tools to external applications via an MCP server is still a useful scenario that a browser implementation may wish to enable. - -### Existing web automation techniques (DOM, accessibility tree) - -One of the scenarios we want to enable is making the web more accessible to general-purpose AI-based agents. In the absence of alternatives like MCP servers to accomplish their goals, these general-purpose agents often rely on observing the browser state through a combination of screenshots, and DOM and accessibility tree snapshots, and then interact with the page by simulating human user input. We believe that WebMCP will give these tools an alternative means to interact with the web that give the web developer more control over whether and how an AI-based agent interacts with their site. - -The proposed API will not conflict with these existing automation techniques. If an agent or assistive tool finds that the task it is trying to accomplish is not achievable through the WebMCP tools that the page provides, then it can fall back to general-purpose browser automation to try and accomplish its task. - -## Future explorations - -### Progressive web apps (PWA) - -PWAs should also be able to use the WebMCP API as described in this proposal. There are potential advantages to installing a site as a PWA. In the current proposal, tools are only discoverable once a page has been navigated to and only persist for the lifetime of the page. A PWA with an app manifest could declare tools that are available "offline", that is, even when the PWA is not currently running. The host system would then be able to launch the PWA and navigate to the appropriate page when a tool call is requested. - -### Background model context providers - -Some tools that a web app may want to provide for agents and assistive technologies may not require any web UI. For example, a web developer building a "To Do" application may want to expose a tool that adds an item to the user's todo list without showing a browser window. The web developer may be content to just show a notification that the todo item was added. - -For scenarios like this, it may be helpful to combine tool call handling with something like the ['launch'](https://github.com/WICG/web-app-launch/blob/main/sw_launch_event.md) event. A client application might attach a tool call to a "launch" request which is handled entirely in a service worker without spawning a browser window. - -## Acknowledgments - -Many thanks to [Alex Nahas](https://github.com/MiguelsPizza) and [Jason McGhee](https://github.com/jasonjmcghee/) for sharing related [implementation](https://github.com/MiguelsPizza/WebMCP) [experience](https://github.com/jasonjmcghee/WebMCP). +https://starborn.github.io/webmcp/webmcp-quiz.html (Quiz) +https://starborn.github.io/webmcp/whatotest.md +https://starborn.github.io/webmcp/browserhack.md +https://starborn.github.io/webmcp/WebMCnotMCP.md diff --git a/WebMCnotMCP.md b/WebMCnotMCP.md new file mode 100644 index 0000000..529d248 --- /dev/null +++ b/WebMCnotMCP.md @@ -0,0 +1,70 @@ +# WebMCP Technical Note: WebMCP Is Not an MCP Server + +WebMCP is not MCP -- a clarification for implementers +The WebMCP README offers a useful pedagogical framing: "web pages that use WebMCP can be thought of as MCP servers that implement tools in client-side script instead of on the backend." +This analogy helps developers orient quickly. But it should not be taken as architectural equivalence, and treating it as such produces design errors. +What the analogy captures: both WebMCP and MCP expose tools that agents can invoke. At the functional surface, they look similar. +What the analogy obscures: they operate at different layers with different security models, different trust assumptions, and different governance boundaries. + +MCP is a network-layer protocol. Trust is established between client and server across a network boundary. The security model is connection-based. +WebMCP is a browser-layer architecture. Trust is mediated by the browser's origin model, user consent mechanisms, and permission policies. The security model is origin-based. +([W3C spec repo](https://github.com/webmachinelearning/webmcp)) +([McGhee implementation](https://github.com/jasonjmcghee/WebMCP)) +([Nahas MCP-B](https://github.com/MiguelsPizza/WebMCP)) + +The framing is understandable. It is also architecturally misleading, and the confusion has consequences for how developers, security reviewers, and standards participants evaluate the specification. + +## The Analogy and Its Limits + +WebMCP and Anthropic's Model Context Protocol share a conceptual ancestor: both define "tools" as functions with natural language descriptions and structured schemas that AI agents can discover and invoke. That is where the meaningful similarity ends. + +**Anthropic's MCP** is a backend protocol. It uses JSON-RPC 2.0 as its message format, transported over stdio, HTTP with Server-Sent Events, or Streamable HTTP. MCP servers are hosted processes -- typically written in Python or Node.js -- that run on backend infrastructure. They connect AI platforms like Claude, ChatGPT, or Gemini to external services. Authentication follows OAuth 2.1 or custom API key schemes. No browser is required. No human user needs to be present. Headless, fully automated operation is the norm. +([Source](https://modelcontextprotocol.io/introduction)) + +**WebMCP** is a frontend browser API. It uses the browser's native postMessage system for communication between the web page and the agent. Tools are registered and executed as client-side JavaScript within an active browser tab. Authentication is inherited from the browser session -- whatever cookies or federated login the user already has. A human user must be present in an active browser session. Headless browsing is explicitly out of scope. +([Source](https://webmachinelearning.github.io/webmcp/)) + +The specification's own language -- "can be thought of as" -- acknowledges this is an analogy, not an identity. But the README, the press coverage, and the developer ecosystem have largely dropped the qualifier. The result is that WebMCP is widely discussed as though it were MCP running in the browser, with all the assumptions that entails. + +## What the Framing Gets Wrong + +When a developer hears "your website becomes an MCP server," they import a set of assumptions from the MCP architecture. Every one of these assumptions is wrong for WebMCP. + +**Transport.** MCP uses JSON-RPC 2.0, a well-specified request-response protocol with defined error codes, batching, and notification semantics. WebMCP uses postMessage, the browser's cross-origin communication mechanism. These have different reliability characteristics, different error handling models, and different security boundaries. Code written for one transport does not work with the other. + +**Execution context.** An MCP server runs in a controlled backend environment -- a container, a VM, a serverless function -- where the service provider manages the runtime, dependencies, and resource limits. WebMCP tools run in the browser's JavaScript engine, in the same execution context as the web page's own code. They are subject to the browser's security sandbox, but also to its constraints: single-threaded execution, same-origin policy, and the full surface area of client-side attack vectors. + +**Authentication.** MCP's specification has adopted OAuth 2.1 for authentication between clients and servers. This was, notably, the problem that motivated WebMCP's creation -- Alex Nahas at Amazon found that OAuth 2.1 was impractical for internal MCP deployments. WebMCP sidesteps this entirely by inheriting the browser session. This is elegant for usability but means the authentication model is whatever the website happens to use, with no protocol-level guarantees. + +**Trust direction.** In MCP, the AI platform (client) connects to a known, registered server. The platform decides which servers to trust. In WebMCP, any website the user visits can register tools. The trust decision shifts from the AI platform to the browser, and potentially to the user -- who may not know that tools have been registered at all, since the current specification provides no visible indicator. + +**Operational mode.** MCP servers are designed for automated, programmatic access. They can run continuously, handle concurrent requests, and operate without human involvement. WebMCP requires an active browser tab with a human user present. The specification explicitly excludes headless browsing. These are fundamentally different operational paradigms with different scaling characteristics, different failure modes, and different abuse surfaces. + +## Why This Matters for Standards Review + +The "MCP server" framing is not just imprecise. It actively interferes with rigorous evaluation of the specification. + +**Security reviewers** who approach WebMCP as "MCP in the browser" will evaluate it against MCP's threat model. But MCP's threat model assumes a controlled backend environment, authenticated client-server connections, and server-side access control. WebMCP's actual threat model involves client-side JavaScript execution, browser-based trust boundaries, and the full range of web security concerns including cross-site scripting, prompt injection via tool responses, and silent tool registration. Importing the wrong threat model means asking the wrong security questions. + +**Developers** who approach WebMCP as "MCP in the browser" may expect protocol-level interoperability -- that a WebMCP tool definition could be used interchangeably with an MCP server tool definition, or that MCP client libraries could connect to WebMCP pages. They cannot. The tool schema format may be similar, but the transport, discovery, and invocation mechanisms are incompatible. + +**Standards participants** who approach WebMCP as "MCP in the browser" may underestimate the scope of new specification work required. WebMCP is not an adaptation of MCP to a new environment. It is a new browser API that borrows one concept (the tool abstraction) from MCP and implements everything else differently. It needs its own security review, its own privacy analysis, its own accessibility evaluation, and its own consent model -- none of which can be inherited from MCP. + +## What WebMCP Actually Is + +WebMCP is a proposed browser API -- specifically, a new interface on navigator.modelContext -- that allows web pages to declare JavaScript functions as tools that browser-based AI agents can discover and invoke. It uses the browser's existing communication, security, and session management infrastructure rather than introducing a new protocol. + +The design has real strengths. Authentication reuse eliminates one of the hardest problems in AI-service integration. Client-side execution means no backend infrastructure is needed. The human-in-the-loop requirement provides a natural consent and oversight mechanism -- if implemented correctly. + +But these strengths are specific to WebMCP's actual architecture, not to the MCP analogy. Evaluating WebMCP on its own terms -- as a browser API with browser security characteristics -- leads to better questions, better testing, and better specifications than evaluating it as a variant of MCP. + +## A Suggested Clarification + +The W3C specification and its README should explicitly state that WebMCP is not an implementation of the Model Context Protocol and does not use the MCP wire protocol. It borrows the "tool" abstraction -- functions with schemas and natural language descriptions -- but implements discovery, registration, invocation, and communication through browser-native mechanisms that are architecturally distinct from MCP. + +The analogy is useful for first contact. A developer unfamiliar with WebMCP can quickly grasp the concept by thinking "it is like an MCP server, but in the browser." But the specification itself, the security review, and the community evaluation should not rely on the analogy. They should address WebMCP as what it is: a new browser API with its own architecture, its own threat model, and its own design space. + +## Both Can Coexist + +None of this is an argument against WebMCP or against MCP. A company might maintain an MCP server for direct API integrations with AI platforms and simultaneously implement WebMCP tools on its consumer-facing website for browser-based agent interaction. The two are complementary, not competing, and not identical. Recognizing the distinction is necessary for evaluating each on its own merits. + diff --git a/browserhack.md b/browserhack.md new file mode 100644 index 0000000..73736de --- /dev/null +++ b/browserhack.md @@ -0,0 +1,171 @@ +# WebMCP: How a Browser Hack Became a Proposed Web Standard + +**Anthropomorphic Press -- Technical Note** +**15 February 2026** + +--- + +On February 10, 2026, Google's Chrome team launched an early preview of something called WebMCP -- a proposed web standard that would allow any website to expose structured, callable tools to AI agents through a new browser API called navigator.modelContext. Within 72 hours, the announcement had generated coverage in VentureBeat, Search Engine Land, WinBuzzer, The New Stack, and dozens of developer blogs. SEO commentators called it the biggest shift in technical SEO since structured data. +([Source](https://searchengineland.com/google-releases-preview-of-webmcp-how-ai-agents-interact-with-websites-469024)) + +Members of the W3C community group where the specification is supposedly being incubated however learned about it from that same press coverage -- not from the group itself. + +This technical note reconstructs the chain of events. + +## What Is WebMCP + +WebMCP (Web Model Context Protocol) is a proposed JavaScript API that allows web developers to expose their web application functionality as "tools" -- JavaScript functions with natural language descriptions and structured schemas that can be invoked by AI agents, browser assistants, and assistive technologies. As the specification states: "Web pages that use WebMCP can be thought of as Model Context Protocol servers that implement tools in client-side script instead of on the backend." +([Source](https://webmachinelearning.github.io/webmcp/)) + +The specification proposes two APIs. A Declarative API handles standard actions that can be defined directly in HTML forms. An Imperative API handles more complex, dynamic interactions requiring JavaScript execution through navigator.modelContext.registerTool(). Together they allow web pages to function as tool servers for AI agents, running entirely client-side in the browser. +([Source](https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md)) + +Critically, WebMCP is not the same thing as Anthropic's Model Context Protocol (MCP), despite sharing a name fragment and conceptual lineage. The two protocols operate in different layers and serve different purposes. Anthropic's MCP is a backend protocol: it uses JSON-RPC for client-server communication, runs on hosted servers (typically in Python or Node.js), connects AI platforms like Claude or ChatGPT to external services, and does not require a browser or a human user to be present. WebMCP is a frontend, browser-native API: it runs entirely client-side in JavaScript, uses the browser's postMessage system for communication, requires an active browser session with a human user present, and exposes website functionality directly to agents operating within that browser context. A company might use both: an MCP server for direct API integrations with AI platforms, and WebMCP tools on its consumer-facing website so that browser-based agents can interact with the site while the user is actively browsing. The two are complementary, not competing. +([Source](https://webmachinelearning.github.io/webmcp/) and [Source](https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md)) + +The WebMCP specification explicitly declares that headless browsing, fully autonomous agents, and backend service integration are out of scope. "Headless browsing" refers to running a browser without a visible interface -- no screen, no human watching -- as automated tools like Puppeteer and Playwright do. By excluding this, WebMCP requires that a human user is present in an active browser session whenever agents invoke tools. This is a deliberate design choice: the standard is built around cooperative, human-in-the-loop workflows, not unsupervised automation. +([Source](https://webmachinelearning.github.io/webmcp/)) + +## Origin: From Amazon Frustration to Browser Hack + +The concept traces back to Alex Nahas, a backend engineer at Amazon. When Anthropic's MCP arrived in early 2025, Amazon spun up what amounted to one enormous MCP server with thousands of tools. The real problem, however, was authorization: MCP's spec had adopted OAuth 2.1, which essentially nobody at Amazon had implemented internally. Every internal service had its own authentication story. +([Source](https://www.arcade.dev/blog/web-mcp-alex-nahas-interview)) + +Nahas realized the browser itself could solve the auth problem -- users are already signed in through federated browser sessions. He developed MCP-B (Model Context Protocol for Browsers), a Chrome extension that let websites expose MCP-compatible tools directly through browser JavaScript, using existing authentication and security models. The underlying protocol he called "WebMCP." +([Source](https://github.com/MiguelsPizza/WebMCP)) +([Source](https://docs.mcp-b.ai/introduction)) + +Independently, a separate early implementation by Jason McGhee also used the name "WebMCP" for a similar concept -- a widget allowing any website to act as an MCP server client-side. McGhee has since deferred to the W3C version, noting his implementation is "not compliant with the W3C spec" and that "much more capable folks that develop the web" have taken up the work. +([Source](https://github.com/jasonjmcghee/WebMCP)) + +## Google and Microsoft Enter: The W3C Pathway + +On August 28, 2025, Patrick Brosset of the Microsoft Edge team published a blog post introducing WebMCP as "a proposal to let you, web developers, control how AI agents interact with your web pages." He described it as a joint effort between the Edge and Google teams and solicited early feedback. +([Source](https://patrickbrosset.com/articles/2025-08-28-ai-agents-and-the-web-a-proposal-to-keep-developers-in-the-loop/)) + +In an interview published by The New Stack, Kyle Pflug, group product manager for the web platform at Microsoft Edge, confirmed that WebMCP was a joint Microsoft-Google initiative. Pflug noted that Alex Nahas had joined the group, and that the priority for the rest of 2025 was "deeper conversations with web developers" and working toward "an early developer preview in Chromium." +([Source](https://thenewstack.io/how-webmcp-lets-developers-control-ai-agents-with-javascript/)) + +The specification was placed in the GitHub repository of the W3C Web Machine Learning Community Group at github.com/webmachinelearning/webmcp. The repo shows open issues dating from October-November 2025, primarily filed by Khushal Sagar of Google (a spec editor), with issue tracker activity through at least November 17, 2025. +([Source](https://github.com/webmachinelearning/webmcp/issues)) + +## Where Is It Housed: The Web Machine Learning Community Group + +The specification itself states: "This specification was published by the Web Machine Learning Community Group. It is not a W3C Standard nor is it on the W3C Standards Track." The draft is dated February 12, 2026 and lists three editors: Brandon Walderman (Microsoft), Khushal Sagar (Google), and Dominic Farolino (Google). +([Source](https://webmachinelearning.github.io/webmcp/)) + +The Web Machine Learning Community Group was originally proposed on October 3, 2018 by Anssi Kostiainen (Intel) to incubate the Web Neural Network API. It has since expanded its charter to include additional deliverables. The updated charter now lists the "WebMCP API" as a specification deliverable, described as "An API for web apps to expose their functionality as tools to AI agents and assistive technologies." +([Source](https://webmachinelearning.github.io/charter/)) + +The charter also notes that the WebML CG "should coordinate with" the AI Agent Protocol Community Group "to ensure these protocols consider WebMCP API requirements, as applicable." +([Source](https://webmachinelearning.github.io/charter/)) + +The CG participant lists show that Google LLC, Microsoft Corporation, Intel Corporation, Samsung, Apple Inc., Huawei, and others have representatives in the group. +([CG participants](https://www.w3.org/groups/cg/webmachinelearning/participants/)) +([WG participants](https://www.w3.org/groups/wg/webmachinelearning/participants/)) + +The webmachinelearning GitHub organization (https://github.com/webmachinelearning) hosts the WebMCP repo alongside the Web Neural Network API (WebNN), Prompt API, Translation API, Writing Assistance APIs, and Proofreader API. The webmcp repo shows 436 stars and 21 forks as of this writing, with its last update on December 12, 2025. +([Source](https://github.com/orgs/webmachinelearning/repositories)) + +Separately, a "WebMCP-org" GitHub organization (https://github.com/WebMCP-org) hosts MCP-B-related implementation code, npm packages, and example applications, including React hooks and transport layers. This is the implementation side, distinct from the specification repo. +([Source](https://github.com/WebMCP-org)) + +## The Chrome Launch: February 10, 2026 + +On February 10, 2026, Google's Andre Cipriani Bandarra announced the WebMCP Early Preview Program. The announcement stated that WebMCP aims to provide a standard way for exposing structured tools, ensuring AI agents can perform actions with increased speed, reliability, and precision. Access to the preview is available in Chrome 146 Canary behind the "WebMCP for testing" flag at chrome://flags. +([Source](https://searchengineland.com/google-releases-preview-of-webmcp-how-ai-agents-interact-with-websites-469024)) + +The announcement generated immediate and extensive press coverage. VentureBeat reported the specification was transitioning from community incubation within the W3C to a formal draft, and noted the comparison drawn by Chrome staff engineer Khushal Sagar that WebMCP aims to become "the USB-C of AI agent interactions with the web." +([Source](https://venturebeat.com/infrastructure/google-chrome-ships-webmcp-in-early-preview-turning-every-website-into-a)) + +WinBuzzer reported early benchmarks showing approximately 67% reduction in computational overhead compared to traditional visual agent-browser interactions. No other browser vendor has announced implementation timelines, though Microsoft's co-authorship suggests Edge support is probable. +([Source](https://winbuzzer.com/2026/02/13/google-chrome-webmcp-early-preview-ai-agents-xcxwbn/)) + +## The Process Question + +The technical merits of WebMCP are not the concern raised here. The concern is procedural. + +The specification is described as being incubated by a W3C Community Group. A CG draft carries a specific meaning in W3C process -- it is a collaborative, community-driven document subject to group deliberation, review, and consensus-building. Yet the Chrome team shipped a working implementation in Chrome 146 and launched a public developer program before the specification was mature. The spec on GitHub contains multiple sections marked TODO. Open issues remain unresolved. +([Source](https://github.com/webmachinelearning/webmcp/issues)) + +As one critical assessment noted, Chrome's unilateral advancement through an early preview program raises questions about whether competing browser vendors will adopt compatible approaches, and the announcement provides limited technical documentation about API structure, authentication mechanisms, or permission models. +([Source](https://ppc.land/chromes-webmcp-could-end-ai-agents-pixel-parsing-nightmare/)) + +Members of the Web Machine Learning Community Group report learning about WebMCP from external press coverage rather than through the group's own communication channels. This raises the question of whether the CG incubation process served as genuine community deliberation or as a hosting arrangement for a specification driven primarily by two browser vendors. + +This pattern -- browser vendors shipping early implementations to generate developer momentum while the standardization process is still in progress -- is not new. It has a long history in web standards. But it raises particular questions when applied to AI agent infrastructure, where the security, privacy, and trust implications of exposing website functionality to autonomous systems are significant and largely unresolved. + +## De Facto Standard in the Making? + +To be precise about the status: WebMCP is a Draft Community Group Report. The W3C FAQ explicitly states that "Community and Business Group Reports are not yet W3C Standards" and that groups should not refer to CG work as "standards work" or "draft standards." CG Reports are not W3C Recommendations and do not carry the weight of the W3C Recommendation Track process. +([Source](https://www.w3.org/community/about/faq/)) + +Yet Chrome 146 already ships a working implementation behind a feature flag. This is the classic pattern of a de facto standard: ship the implementation, build developer adoption, and the specification follows the code rather than the other way around. The question is whether the community process will shape the final standard, or whether the Chrome implementation will become the reference that everyone else must follow. + +It should be noted that WebMCP was discussed at W3C TPAC 2025 in Kobe, Japan, within the Web Machine Learning Community Group sessions. The W3C blog reports that WebMCP was "a major topic" including "considerations about how to manage consent and permissions for sensitive actions in a WebMCP context." +([Source](https://www.w3.org/blog/ -- TPAC 2025 report)) + +## How to Contribute + +For developers, standards practitioners, and anyone concerned about how AI agents will interact with the web, here are the concrete pathways to participate in shaping WebMCP: + +**1. Join the W3C Web Machine Learning Community Group.** W3C Community Groups are open to all. No W3C Membership is required and there is no fee. You need a free W3C account and must agree to the W3C Community Contributor License Agreement (CLA). Join at: https://www.w3.org/community/webmachinelearning/ +([Source](https://www.w3.org/community/about/)) + +**2. File issues and contribute via GitHub.** The charter states that participants should make all contributions in the GitHub repo, by pull request (preferred), by raising an issue, or by commenting on an existing issue. The spec repo is at: https://github.com/webmachinelearning/webmcp +([Source](https://webmachinelearning.github.io/charter/)) + +**3. Test the Chrome implementation and provide feedback.** The early preview is available in Chrome 146 Canary by enabling the "WebMCP for testing" flag at chrome://flags. Developers can apply for access to documentation and demos through Google's Early Preview Program. + +**4. Engage with the AI Agent Protocol Community Group.** The WebML CG charter identifies coordination with this separate CG, which develops protocols for AI agent discovery and collaboration across the web. If you work on agent interoperability, this is a relevant touchpoint. + +## What to Watch + +Several questions remain open. How will WebMCP interact with existing accessibility frameworks, given the specification's claim that the API would benefit assistive technologies? How will rate limiting and abuse prevention work when agents can invoke website tools programmatically? What happens when prompt injection meets client-side tool registration? And critically -- will the W3C community group process catch up to the Chrome implementation, or will the implementation become the de facto standard regardless of community input? The specification is open. The implementation is live. The window for community influence is now. + +## Chronology + +- **October 3, 2018** -- Web Machine Learning Community Group proposed at W3C by Anssi Kostiainen (Intel) to incubate Web Neural Network API. ([Source](https://www.w3.org/community/webmachinelearning/)) + +- **Early 2025** -- Anthropic's Model Context Protocol gains adoption. Alex Nahas at Amazon encounters OAuth 2.1 authorization problems with internal MCP deployment. ([Source](https://www.arcade.dev/blog/web-mcp-alex-nahas-interview)) + +- **2025** -- Alex Nahas develops MCP-B Chrome extension; Jason McGhee independently develops early WebMCP widget. Both demonstrate browser-based MCP feasibility. ([Source](https://github.com/MiguelsPizza/WebMCP) and [Source](https://github.com/jasonjmcghee/WebMCP)) + +- **August 28, 2025** -- Patrick Brosset (Microsoft Edge) publishes blog post introducing WebMCP as joint Edge-Google proposal, soliciting early developer feedback. ([Source](https://patrickbrosset.com/articles/2025-08-28-ai-agents-and-the-web-a-proposal-to-keep-developers-in-the-loop/)) + +- **September-October 2025** -- Kyle Pflug (Microsoft Edge) confirms joint initiative in interview with The New Stack. Alex Nahas joins the group. Specification placed in webmachinelearning GitHub org. ([Source](https://thenewstack.io/how-webmcp-lets-developers-control-ai-agents-with-javascript/)) + +- **October-November 2025** -- Open issues filed on webmachinelearning/webmcp repo, primarily by spec editor Khushal Sagar (Google). Issues tagged "Agenda+" for CG discussion. ([Source](https://github.com/webmachinelearning/webmcp/issues)) + +- **December 12, 2025** -- Last recorded update to webmcp repo on GitHub. ([Source](https://github.com/orgs/webmachinelearning/repositories)) + +- **February 10, 2026** -- Google launches WebMCP Early Preview Program in Chrome 146 Canary. Press coverage erupts. ([Source](https://searchengineland.com/google-releases-preview-of-webmcp-how-ai-agents-interact-with-websites-469024)) + +- **February 12, 2026** -- WebMCP specification dated as "Draft Community Group Report." ([Source](https://webmachinelearning.github.io/webmcp/)) + +## Reference URLs + +- WebMCP specification (Draft CG Report, 12 Feb 2026): https://webmachinelearning.github.io/webmcp/ +- WebMCP proposal/explainer: https://github.com/webmachinelearning/webmcp/blob/main/docs/proposal.md +- WebMCP GitHub repo (webmachinelearning org): https://github.com/webmachinelearning/webmcp +- Open issues: https://github.com/webmachinelearning/webmcp/issues +- WebML CG charter (lists WebMCP as deliverable): https://webmachinelearning.github.io/charter/ +- WebML CG home page: https://www.w3.org/community/webmachinelearning/ +- WebML CG participants: https://www.w3.org/groups/cg/webmachinelearning/participants/ +- WebML WG participants: https://www.w3.org/groups/wg/webmachinelearning/participants/ +- webmachinelearning GitHub org: https://github.com/orgs/webmachinelearning/repositories +- Brosset blog post (Aug 28, 2025): https://patrickbrosset.com/articles/2025-08-28-ai-agents-and-the-web-a-proposal-to-keep-developers-in-the-loop/ +- The New Stack interview with Pflug: https://thenewstack.io/how-webmcp-lets-developers-control-ai-agents-with-javascript/ +- Arcade.dev Nahas interview: https://www.arcade.dev/blog/web-mcp-alex-nahas-interview +- Nahas MCP-B repo: https://github.com/MiguelsPizza/WebMCP +- McGhee early WebMCP: https://github.com/jasonjmcghee/WebMCP +- WebMCP-org GitHub: https://github.com/WebMCP-org +- MCP-B docs: https://docs.mcp-b.ai/introduction +- Search Engine Land (Feb 11): https://searchengineland.com/google-releases-preview-of-webmcp-how-ai-agents-interact-with-websites-469024 +- VentureBeat (Feb 12): https://venturebeat.com/infrastructure/google-chrome-ships-webmcp-in-early-preview-turning-every-website-into-a +- WinBuzzer (Feb 13): https://winbuzzer.com/2026/02/13/google-chrome-webmcp-early-preview-ai-agents-xcxwbn/ +- PPC Land (Feb 15): https://ppc.land/chromes-webmcp-could-end-ai-agents-pixel-parsing-nightmare/ + +--- + +*Anthropomorphic Press, indexed in Dow Jones Factiva. CWRE* diff --git a/index.html b/index.html new file mode 100644 index 0000000..fc80ce8 --- /dev/null +++ b/index.html @@ -0,0 +1,966 @@ + + + + + + WebMCP Model Card Generator + + + + + + + + + + +
+ + + diff --git a/webmcp-complete-guide.html b/webmcp-complete-guide.html new file mode 100644 index 0000000..4165467 --- /dev/null +++ b/webmcp-complete-guide.html @@ -0,0 +1,687 @@ + + + + + +WebMCP -- The Complete Guide + + + + +
+ +
+

WebMCP: Everything You Need to Know

+
A plain-language technical guide to Google's proposed browser API for AI agent-website interaction
+
+ + + + +
+
Section 01
+

The Big Picture -- What Problem Does WebMCP Solve?

+ +

Right now, when an AI agent (like me, or ChatGPT, or Gemini) wants to do something on a website -- book a flight, fill a form, check a price -- it has two bad options:

+ +

Option A: Screen scraping. The agent looks at the website like a human would, tries to figure out where the buttons are, and clicks them. This is fragile, slow, and breaks whenever the website changes its layout. It is like trying to operate a machine by looking at a photo of the control panel.

+ +

Option B: Backend API. The website builds a separate server-side MCP server that the agent connects to. This works well but requires backend engineering, server infrastructure, and maintenance. Many websites will never do this.

+ +

WebMCP is Option C: The website itself tells the agent what it can do, directly in the browser. The website says: "Here are my tools -- you can search products, add to cart, check availability. Here is what each tool needs as input, and here is what it will give you back." The agent does not need to look at the screen. It just calls the tools.

+ +
A restaurant menu. Instead of the AI agent walking into the kitchen and trying to figure out how to cook, the website hands it a menu: "Here is what we serve, here is what each dish needs, here is how to order." The agent reads the menu and places orders.
+ +
WebMCP makes any website into an AI-friendly service, with no backend needed. The website's existing JavaScript code does the work. The AI agent just needs to know what tools are available.
+
+ + +
+
Section 02
+

The Key Players and How They Relate

+ +

Agent

+

An autonomous assistant that understands goals and takes actions. Today, these are typically LLM-based: Claude, ChatGPT, Gemini. The agent is the one calling the tools that websites expose.

+ +

Browser's Agent

+

An agent that lives inside the browser itself, rather than in a separate app. Google is building this into Chrome (think of it as an AI assistant built into your browser toolbar). This is different from an external agent like Claude Desktop connecting to the browser.

+ +

AI Platform

+

The company providing the agent -- Anthropic, OpenAI, Google. The AI platform's agent connects to WebMCP tools.

+ +

Web Developer

+

The person who builds the website. They are the ones who will use WebMCP to register tools on their site.

+ +

User

+

The human sitting at the browser. WebMCP is designed for "user-present" interactions -- the human is there, watching, and can be asked for confirmation before the agent does something important.

+ +
The user is at a restaurant (the website). The agent is their personal assistant, reading the menu (WebMCP tools) and placing orders on their behalf. The browser is the restaurant building. The AI platform is the agency that employs the assistant.
+
+ + +
+
Section 03
+

MCP vs WebMCP vs MCP-B -- The Family Tree

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
WhatWho made itWhere it runsWhat it does
MCP
(Model Context Protocol)
AnthropicOn a server (backend)The original protocol. Applications expose tools, resources, and prompts to AI models through a server that runs on the backend. Claude Desktop, OpenAI Agents SDK, and many others support it.
WebMCP
(Web Model Context Protocol)
W3C Web Machine Learning Community Group (Google, Microsoft engineers leading)In the browser (frontend)Adapts MCP concepts for the web. Websites expose tools through JavaScript in the browser. No backend server needed. Uses the browser's own security model. Currently a draft specification.
MCP-B
(MCP for Browser)
Community project (WebMCP-org on GitHub)Browser extension + JavaScript libraryA bridge. Since browsers do not natively support WebMCP yet, MCP-B provides a polyfill (temporary code that fills the gap) implementing the navigator.modelContext API, and translates between WebMCP format and the MCP wire protocol so existing MCP clients can talk to WebMCP-enabled sites.
+ +
MCP is the foundation protocol (backend). WebMCP brings the same ideas to the browser (frontend). MCP-B is the bridge that makes WebMCP work today before browsers add native support. They are complementary, not competing.
+
+ + +
+
Section 04
+

The API -- Every Term Explained

+ +

The WebMCP API is surprisingly small. There are only a few pieces, and each one does something specific. Here they are:

+ +

navigator.modelContext

+

This is the entry point. navigator is a built-in browser object that gives access to browser features (like navigator.geolocation gives access to GPS). WebMCP adds modelContext to it. So navigator.modelContext is where all WebMCP functionality lives.

+ +
The navigator object is like the browser's control panel. modelContext is a new button on that control panel labeled "AI Tools."
+ +

Four Methods (Actions You Can Take)

+ +
+
provideContext(options)
+
Registers a complete set of tools all at once. If there were any tools registered before, it clears them first and replaces with the new set. Use this when you want to say: "Here is everything this page offers."
+
+ +
+
clearContext()
+
Removes all registered tools. The page goes quiet -- no tools available for agents. Use this when navigating away or when the page should stop offering AI-callable functionality.
+
+ +
+
registerTool(tool)
+
Adds one single tool to the existing set without removing anything. Use this when you want to add new capabilities dynamically -- for example, a "checkout" tool that only appears after the user adds items to their cart.
+
+ +
+
unregisterTool(name)
+
Removes one specific tool by its name. Use this when a capability is no longer available -- for example, removing the "apply discount" tool after the discount has been applied.
+
+ +
provideContext = "here is everything" (replaces all). registerTool = "add one more" (keeps existing). clearContext = "remove everything." unregisterTool = "remove just this one."
+ +

The Tool Object -- What a Tool Looks Like

+ +

Every tool you register has these parts:

+ +
+
name
+
A unique identifier, like "addToCart" or "searchProducts". The agent uses this name to call the tool. Must be unique on the page -- you cannot have two tools with the same name.
+
+ +
+
description
+
A natural language explanation of what the tool does. This is what the AI agent reads to decide whether to use this tool. Example: "Add a product to the shopping cart by product ID and quantity." Write it for an AI, not for a programmer.
+
+ +
+
inputSchema
+
A JSON Schema describing what inputs the tool expects. It says: "I need a productId (text) and a quantity (number, minimum 1)." The agent reads this to know what data to send. If the agent sends the wrong kind of data, the browser rejects it.
+
+ +
+
execute
+
The actual function that runs when the agent calls the tool. This is your website's existing JavaScript code -- the same code that runs when a human clicks a button. The function receives the input data and returns a result.
+
+ +
+
annotations (optional)
+
Extra metadata about the tool. Currently only one annotation exists: readOnlyHint. If set to true, it tells the agent: "This tool only reads data -- it does not change anything." This helps agents decide which tools are safe to call without asking the user first.
+
+ +

Here is what a complete tool registration looks like in code:

+ +
// Register a tool that searches products on an e-commerce site +navigator.modelContext.registerTool({ + name: 'searchProducts', + description: 'Search for products by keyword, category, or price range', + inputSchema: { + type: 'object', + properties: { + query: { type: 'string', description: 'Search keywords' }, + maxPrice: { type: 'number', description: 'Maximum price filter' } + }, + required: ['query'] + }, + annotations: { readOnlyHint: true }, // Safe -- only reads, doesn't change anything + async execute(input, client) { + // This calls the site's existing search function + const results = await searchAPI(input.query, input.maxPrice); + return { products: results }; + } +});
+ +

ModelContextClient -- The Agent's Identity

+ +
+
ModelContextClient
+
When an agent calls a tool, the execute function receives two things: the input data, and a client object representing the agent. This client object has one crucial method: requestUserInteraction().
+
+ +
+
requestUserInteraction(callback)
+
This is the human-in-the-loop mechanism. During tool execution, the code can pause and ask the user for input. For example: "The agent wants to purchase this item for $49.99. Confirm?" The user clicks yes or no, and the tool continues or cancels based on their response.
+
+ +
Your personal assistant calls the restaurant to make a reservation. Midway through, the assistant says: "They only have a table at 9pm instead of 8pm. Should I take it?" You say yes or no. That pause-and-ask is requestUserInteraction.
+ +
The requestUserInteraction mechanism provides human-in-the-loop consent for consequential actions. An open question for the specification is whether there should also be a preview or approval step before tools are even discoverable by agents.
+
+ + +
+
Section 05
+

Security -- How WebMCP Stays Safe

+ +

Origin-Based Security

+

The web has a concept called "origin" -- it is the combination of protocol + domain + port. For example, https://amazon.com is one origin, and https://evil-site.com is a different origin. Browsers enforce strict rules about what one origin can access from another.

+ +

WebMCP inherits this model. A tool registered on amazon.com can only access amazon.com's data. An agent calling that tool operates within amazon.com's security boundary. A malicious site cannot register tools that access another site's data.

+ +
Each website is like a separate building with its own locks and keys. WebMCP tools can only open doors inside their own building. They cannot reach into the building next door.
+ +

SecureContext Requirement

+

The spec requires SecureContext, which means WebMCP only works on HTTPS pages (encrypted connections). It will not work on plain HTTP. This prevents eavesdropping on tool calls.

+ +

User-Present Model

+

WebMCP is designed for situations where the user is present at the browser. This is different from server-side MCP, where agents might operate autonomously in the background. The user-present assumption is why requestUserInteraction() exists -- the spec expects a human to be available for confirmation.

+ +
WebMCP's security comes from three layers: origin isolation (each site is sandboxed), HTTPS requirement (encrypted connections), and user-present design (human in the loop). It builds on what the web already does rather than inventing new security from scratch.
+
+ + +
+
Section 06
+

The Consent Gap -- A Key Open Question

+ +

A key question for the WebMCP specification:

+ +

Currently, any website can register any number of tools the moment a user visits it. An AI agent connected to the browser can immediately discover and potentially call those tools. There is no step where the user sees: "This website wants to expose 12 tools to your AI agent. Allow?"

+ +

Compare this to how other browser capabilities evolved:

+ + + + + + + + + + + + + + + + + + + + + + +
CapabilityPermission model
Camera / MicrophoneBrowser shows a prompt: "This site wants to use your camera. Allow / Block"
Location (GPS)Browser shows a prompt: "This site wants to know your location. Allow / Block"
NotificationsBrowser shows a prompt: "This site wants to send you notifications. Allow / Block"
WebMCP toolsCurrently: no prompt. Tools are silently registered and discoverable.
+ +

This does not mean WebMCP is dangerous right now. The requestUserInteraction() mechanism provides per-action consent. But it means an agent could discover tools without the user knowing, even if it needs permission to execute them.

+ +
This is a design question, not a criticism. Does the spec team envision a permission layer for tool discovery, or is the current thinking that the AI client (Claude, ChatGPT) handles that at its own level? Both approaches are valid -- the intended architecture matters for implementers and for user trust.
+
+ + +
+
Section 07
+

Five Quality Tools for the MCP Ecosystem

+ +

Five open-source tools that work together as a quality pipeline for the MCP ecosystem:

+ +
+
+
1. MCP Server Generator
+
You describe what you want your MCP server to do, and this tool generates production-ready code for you. Like a scaffold builder -- it creates the structure so you just fill in the custom logic.
+
github.com/Starborn/MCP-Server-Generator
+
+ +
+
2. MCP Server Validator
+
Checks your MCP server code for problems without running it. Finds hardcoded passwords, missing security, naming mistakes, known vulnerability patterns. Gives you a score from Critical (below 25%) to Excellent (90-100%) with specific fix instructions.
+
github.com/Starborn/MCP-Server-Validator
+
+ +
+
3. MCP Model Card Generator
+
Creates standardized documentation for your MCP server. Like a product data sheet -- it captures what the server does, what tools it offers, what security it has, how it performs. Outputs both JSON (for machines) and Markdown (for humans).
+
github.com/Starborn/MCP-Model-Card-Generator
+
+ +
+
4. MCP Model Card Specification v1.0
+
The formal definition of what a model card should contain. Six sections: server identity, tool documentation, operational characteristics, security profile, deployment context, evaluation results. This is the standard that the generators follow.
+
starborn.github.io/MCP-Model-Card-Generator/
+
+ +
+
5. WebMCP Model Card Generator
+
The newest tool. Like #3 but specifically for browser-side WebMCP tools instead of backend MCP servers. Has 12 sections covering browser-specific concerns: navigator.modelContext API modes, origin-based security, user interaction patterns, browser compatibility testing. Built within five days of the WebMCP spec being published.
+
starborn.github.io/webmcp/
+
+
+ +
Tools 1-4 are for backend MCP servers. Tool 5 is for browser-side WebMCP tools. Together they cover the entire ecosystem -- both server-side and client-side AI tool infrastructure.
+ +
A separate generator exists for WebMCP because browser-side tools have fundamentally different concerns from backend servers: origin security instead of API keys, no server infrastructure, user-present interaction patterns. The documentation fields differ because the engineering context differs.
+
+ + +
+
Section 08
+

The Standards Process -- Where This Is Going

+ +

Current Status

+

WebMCP is a Draft Community Group Report. In W3C terms, this means it is a proposal being discussed in a Community Group (the Web Machine Learning CG). It is not yet on the W3C Standards Track, and it is not a W3C Recommendation (the final stage of a web standard).

+ +

What That Means Practically

+

The spec is early and open to change. This is exactly the right time to contribute -- before designs are locked in. The spec team is actively soliciting feedback.

+ +

The Path Forward

+

Typically: Community Group Report leads to a Working Group charter, which leads to a Working Draft, then Candidate Recommendation, then full W3C Recommendation. This process takes years. Chrome may implement experimental support (behind a flag) much sooner.

+ +

Contributing

+

The W3C community structure provides established channels for participation. Technical notes, tooling, and quality infrastructure are complementary contributions that help the specification succeed by addressing practical implementation concerns.

+ +
+ + +
+
Section 09
+

Glossary -- Every Technical Term in Plain Language

+ +
+
API (Application Programming Interface)
+
A set of rules for how software talks to other software. WebMCP is an API -- it defines how websites talk to AI agents.
+
+ +
+
AST (Abstract Syntax Tree)
+
A structured representation of code that lets you analyze it without running it. The MCP Server Validator uses AST analysis to find problems in MCP server code safely.
+
+ +
+
Callback
+
A function you hand to someone else to run later. In WebMCP, the execute function is a callback -- you define it, but the agent triggers it when it calls your tool.
+
+ +
+
Client-side / Frontend
+
Code that runs in the user's browser, on their device. WebMCP tools run client-side. Contrast with server-side / backend.
+
+ +
+
Dictionary (in WebIDL)
+
A structured bundle of named values. ModelContextTool is a dictionary -- it bundles together a name, description, schema, and execute function into one package.
+
+ +
+
DOM (Document Object Model)
+
The browser's internal representation of a web page. When JavaScript modifies a page, it changes the DOM.
+
+ +
+
DOMString
+
Just a text string in browser terms. When the spec says a tool's name is a DOMString, it means it is text.
+
+ +
+
Exposed=Window
+
Means this feature is available in regular web pages (as opposed to service workers or other background contexts). WebMCP tools only work in normal browser tabs where a user is present.
+
+ +
+
Interface
+
A blueprint defining what methods and properties an object has. ModelContext is an interface -- it defines that any modelContext object will have provideContext, clearContext, registerTool, and unregisterTool methods.
+
+ +
+
JSON Schema
+
A standard way to describe the shape of data. When a tool says its inputSchema requires a "query" string and an optional "maxPrice" number, that is JSON Schema. It lets the agent know what data to send.
+
+ +
+
Navigator
+
A built-in browser object that provides access to browser features. You already use navigator.geolocation (GPS), navigator.clipboard (copy/paste). WebMCP adds navigator.modelContext (AI tools).
+
+ +
+
Origin
+
The identity of a website: protocol + domain + port. https://amazon.com:443 is one origin. Two different origins cannot access each other's data. This is the foundation of web security and the foundation of WebMCP security.
+
+ +
+
Polyfill
+
Temporary code that provides a feature before browsers add native support. MCP-B is a polyfill for WebMCP -- it makes navigator.modelContext work today even though browsers have not implemented it natively yet.
+
+ +
+
Promise
+
A way to handle things that take time. When a tool's execute function returns a Promise, it means: "I am working on it and will give you the result when I am done." The agent waits for the Promise to resolve.
+
+ +
+
SameObject
+
Every time you access navigator.modelContext, you get the exact same object -- not a copy. This ensures all tool registrations go to the same place.
+
+ +
+
SecureContext
+
Means the feature only works on HTTPS pages (encrypted connection). No WebMCP on unencrypted HTTP. This is a security requirement.
+
+ +
+
Server-side / Backend
+
Code that runs on a remote server, not in the user's browser. Traditional MCP servers run server-side. WebMCP specifically avoids this -- tools run in the browser.
+
+ +
+
Tool Poisoning
+
A security attack where a malicious MCP server exposes tools with misleading descriptions to trick agents into performing harmful actions. The MCP Server Validator detects patterns associated with this.
+
+ +
+
Transport
+
The mechanism for sending messages between systems. MCP uses different transports (stdio, HTTP). MCP-B adds "tab transport" (communication within a browser tab) and "extension transport" (communication through browser extensions).
+
+ +
+
WebIDL (Web Interface Definition Language)
+
The formal language used to write web API specifications. When you see code blocks in the spec with words like interface, dictionary, readonly attribute -- that is WebIDL. It is the blueprint language for browser APIs.
+
+ +
+
Wire Protocol
+
The actual format of messages sent between systems. MCP's wire protocol uses JSON-RPC (structured messages in JSON format). MCP-B translates between WebMCP's browser-native format and MCP's wire protocol.
+
+ +
+ + +
+ Prompted by Paola Di Maio, W3C AI-KR Community Group
+ Prepared by Claude | Contributed to the WebML CG
+ February 2026 +
+ + +
+ + diff --git a/webmcp-quiz.html b/webmcp-quiz.html new file mode 100644 index 0000000..05003a5 --- /dev/null +++ b/webmcp-quiz.html @@ -0,0 +1,526 @@ + + + + + +WebMCP & MCP Tooling -- Quiz by Claude + + + + +
+
+

INTRO: WebMCP & MCP Tooling

+

15 questions -

+
+ +
+ Question 0 / 15 + Score: 0 +
+ +
+ +
+

Meeting Readiness

+
+
+ +
+ +
+
+
+ + + BY CLAUDE WITH LOVE + + diff --git a/webmcp-technical-note-2(3).md b/webmcp-technical-note-2(3).md new file mode 100644 index 0000000..fe85eac --- /dev/null +++ b/webmcp-technical-note-2(3).md @@ -0,0 +1,102 @@ +# WebMCP Technical Note 2: What to Test, What to Watch, What to Tell the Standards Body + +**WebMCP Technical Note Series** +**15 February 2026** + +--- + +Google's WebMCP early preview is live in Chrome 146 Canary. The specification is still a draft. The community group process is still open. This means the window for meaningful community input is right now -- before implementation momentum makes the current design effectively permanent. + +This note is a practical guide. It is written for developers, accessibility practitioners, security researchers, standards participants, and anyone who builds things for the web and wants to understand what WebMCP means for their work. It covers what WebMCP is for, what to test, what the benefits are, what the risks are, and how to communicate findings to the W3C community group that hosts the specification. + +## What WebMCP Is For: The Use Cases + +WebMCP allows a website to declare a set of tools -- JavaScript functions with structured schemas and natural language descriptions -- that AI agents can discover and invoke. The specification targets several categories of use. + +The first is **e-commerce and transactional sites**. A travel booking site could register tools like searchFlights(origin, destination, dates), filterResults(price, stops, airline), and bookFlight(flightId, passengerDetails). Instead of an AI agent trying to parse a complex search interface by reading pixels or DOM elements, it calls the function directly and gets structured JSON back. The site controls exactly what the agent can do and how. + +The second is **productivity and SaaS applications**. A project management tool could expose createTask(title, assignee, dueDate), moveCard(cardId, column), and generateReport(dateRange). Browser-based AI assistants could help users manage workflows without the application needing to build and maintain a separate backend MCP server or API integration for every AI platform. + +The third is **content and media**. A news site could register searchArticles(topic, dateRange) and getArticleSummary(articleId). A mapping service could expose getDirections(from, to, mode) and findNearby(category, radius). These tools let agents interact with content in structured ways rather than scraping and guessing. + +The fourth -- and potentially most significant -- is **accessibility**. The specification claims WebMCP could benefit assistive technologies by providing structured, semantically meaningful interfaces to website functionality. A screen reader enhanced with agent capabilities could invoke tools directly rather than navigating complex visual layouts. This is a strong claim that deserves rigorous testing. + +The fifth is **form automation and multi-step workflows**. Complex processes like insurance applications, government forms, or account setup flows could be exposed as sequences of tool calls, allowing agents to guide users through them step by step while the site maintains control over validation, sequencing, and data handling. + +## What the Benefits Are + +WebMCP offers several concrete advantages over current approaches to AI-web interaction. + +**Reliability** is the most immediate. Today's browser agents -- whether using visual parsing or DOM inspection -- are brittle. A minor CSS change can break a visual agent. A DOM restructuring can invalidate a scraping approach. WebMCP tools are explicit contracts: the site declares what is available, the agent calls it, the response is structured. This should dramatically reduce failure rates for agent-web interaction. + +**Performance** is the second. Visual agents must capture screenshots, send them to a vision model, interpret the response, generate mouse coordinates, and repeat. WinBuzzer reported a 67% reduction in computational overhead with WebMCP compared to visual approaches. Even if that number proves optimistic in production, the architectural advantage is clear: a function call is faster than a screenshot-interpret-click loop. + +**Developer control** is the third. With visual or DOM-based agents, the website has no say in how an agent interacts with it. The agent reverse-engineers the interface. With WebMCP, the developer explicitly defines the interaction surface. Tools can include rate limits, validation, permission requirements, and structured error messages. The site becomes a willing participant in the interaction rather than a passive target. + +**Authentication reuse** is the fourth, and was the original motivation. Because WebMCP runs in the browser session, it inherits whatever authentication the user already has. No OAuth flows, no API keys, no separate credential management. The user is already logged in. The agent operates within that session. This solves one of the hardest problems in AI-service integration. + +**Standardization** is the fifth. If WebMCP succeeds, a developer implements tools once and every conformant agent can use them -- rather than building separate integrations for ChatGPT, Claude, Gemini, and whatever comes next. This is the "USB-C" argument: one interface, many devices. + +## What the Risks Are + +The risks are significant, and several are not yet adequately addressed in the specification. + +**Prompt injection** is the most acute. WebMCP tools return data to AI agents that then process it in their language model context. A malicious or compromised website could craft tool responses that manipulate the agent's behavior -- injecting instructions, altering the agent's understanding of the task, or causing it to take unintended actions on other sites. The specification does not currently define a defense against this beyond same-origin policy boundaries. + +**Scope creep of agent permissions** is the second. WebMCP is designed for human-in-the-loop workflows, with headless browsing explicitly out of scope. But the technical mechanism -- JavaScript functions callable by external code -- does not inherently enforce this. If browser vendors later relax the human-presence requirement, or if extensions find ways to invoke WebMCP tools without user awareness, the permission model collapses. The specification should define what "human in the loop" means technically, not just philosophically. + +**Consent and transparency** is the third. When a user visits a site that registers WebMCP tools, do they know? The current design provides no visible indicator to the user that tools have been registered, what data they expose, or when an agent invokes them. Compare this to other browser permission systems -- camera, microphone, location -- where the user explicitly grants access. WebMCP tools operate silently. + +**Competitive dynamics** is the fourth. WebMCP gives first-mover advantage to sites that implement tools early, potentially favoring large platforms with engineering resources. Smaller sites that do not implement WebMCP may become invisible to agent-mediated browsing. This could accelerate web consolidation. The specification should consider whether a minimal tool set (search, navigation, content retrieval) should be automatically generated from existing web standards like HTML forms, structured data, and ARIA attributes. + +**Data leakage through tool schemas** is the fifth. The natural language descriptions and parameter schemas of registered tools reveal information about a site's internal architecture, business logic, and data models. An agent -- or the platform behind it -- could catalog available tools across thousands of sites to build competitive intelligence. The specification does not address whether tool schemas should be treated as sensitive information. + +**Abuse and rate limiting** is the sixth. Agents can invoke tools at machine speed. A poorly defended site could face thousands of tool invocations per second from a single browser session. The specification mentions rate limiting as a consideration but does not define a standard mechanism. Without one, each site must build its own defenses, and many will not. + +**Cross-site tool chaining** is the seventh. If an agent can invoke tools on multiple open tabs, it could chain actions across sites in ways no individual site anticipated or authorized. Transfer money on a banking site, then use the confirmation on a shopping site, then post about it on a social network -- all within one agent workflow. The security boundaries for cross-site tool interaction are not yet defined. + +## What to Test + +For those with access to Chrome 146 Canary, here are the concrete areas that need community evaluation. Each should generate feedback for the W3C community group. + +**Test the Declarative API with real HTML forms.** Register tools that wrap existing form actions and verify that validation, error handling, and submission behavior match what a human user would experience. Try edge cases: forms with CAPTCHAs, multi-step forms with session state, forms that redirect on submit. Document where the abstraction breaks. + +**Test the Imperative API with dynamic content.** Register tools that interact with JavaScript-heavy applications -- single-page apps, dashboards with real-time data, applications that maintain complex client-side state. Evaluate whether tool calls can reliably interact with application state without causing inconsistencies. + +**Test authentication boundaries.** Log into a site, register tools, then observe what happens when the session expires, when the user logs out in another tab, when cookies are cleared. The specification's authentication reuse claim needs verification under adversarial conditions. + +**Test tool discovery and enumeration.** If multiple sites in different tabs register tools, how does the agent disambiguate? What happens when two sites register tools with the same name? How does the agent present available tools to the user? Is tool discovery observable by the page (can a site detect that an agent has read its tool list)? + +**Test accessibility integration.** If you work with assistive technologies, evaluate whether WebMCP tools provide genuinely better access to site functionality than existing ARIA roles and landmarks. Test with screen readers, switch access devices, and voice control. Document whether WebMCP complements or conflicts with existing accessibility standards. + +**Test prompt injection resilience.** Craft tool responses that contain instruction-like text and observe whether the consuming agent's behavior is affected. This is critical safety research. If tool responses can manipulate agent behavior, the security model is fundamentally incomplete. + +**Test performance claims.** Measure actual latency and token usage for equivalent tasks performed via WebMCP tools versus visual agent interaction. The 67% overhead reduction claim needs independent verification across different site types and task complexities. + +**Test failure modes.** What happens when a tool throws an error? When it returns unexpected data types? When it hangs? When the page navigates away mid-call? The specification should define standard error handling, but the current draft has TODO sections in these areas. Documenting real failure modes will directly shape the specification. + +## How to Communicate Findings + +Feedback is only useful if it reaches the people writing the specification. Here are the concrete channels, in order of effectiveness. + +**File a GitHub issue** at https://github.com/webmachinelearning/webmcp/issues with a clear title, reproducible steps, and a specific recommendation. Tag it with the relevant label if available. The spec editors (Brandon Walderman, Khushal Sagar, Dominic Farolino) monitor this repo. Issues with reproducible test cases and concrete proposals get traction. Issues that say "I don't like this" do not. + +**Join the W3C Web Machine Learning Community Group** at https://www.w3.org/community/webmachinelearning/ and participate in discussion. Community Groups are free and open. Participation in CG calls and mailing list threads carries weight in W3C process. + +If your findings relate to **agent interoperability** -- how WebMCP tools interact with broader agent ecosystems, discovery protocols, or multi-agent workflows -- also engage with the AI Agent Protocol Community Group, which the WebML CG charter identifies as a coordination partner. + +If your findings relate to **security or privacy**, file issues with clear severity assessments. W3C specifications have a tradition of security and privacy self-review questionnaires. Check whether the WebMCP specification has completed one, and if not, request it. + +If you publish your findings -- on a blog, in a report, in an academic paper -- link back to the relevant GitHub issues so the discussion stays connected to the specification process. + +## The Window + +The pattern in web standards is well established. Once an implementation ships in a dominant browser and developers build on it, the specification follows the code. Chrome holds roughly 65% of browser market share. The early preview is live. Developer adoption is beginning. The longer the community waits to engage, the narrower the design space becomes. + +This is not an argument against WebMCP. The technical concept is sound, the use cases are real, and the problem it solves -- giving developers control over AI agent interaction -- is important. But a good idea implemented badly, or without adequate security review, or without accessibility testing, or without community input, becomes a liability embedded in the web platform for decades. + +The specification is at https://webmachinelearning.github.io/webmcp/. The implementation is in Chrome 146 Canary. The issues page is at https://github.com/webmachinelearning/webmcp/issues. The community group is open to all at no cost. The work is now. + +--- + +*Contributed via the W3C AI Knowledge Representation Community Group* diff --git a/webmcp-technical-note-2.md b/webmcp-technical-note-2.md new file mode 100644 index 0000000..8b96ab8 --- /dev/null +++ b/webmcp-technical-note-2.md @@ -0,0 +1,102 @@ +# WebMCP Technical Note 2: What to Test, What to Watch, What to Tell the Standards Body + +**Anthropomorphic Press -- Technical Note 2** +**15 February 2026** + +--- + +Google's WebMCP early preview is live in Chrome 146 Canary. The specification is still a draft. The community group process is still open. This means the window for meaningful community input is right now -- before implementation momentum makes the current design effectively permanent. + +This note is a practical guide. It is written for developers, accessibility practitioners, security researchers, standards participants, and anyone who builds things for the web and wants to understand what WebMCP means for their work. It covers what WebMCP is for, what to test, what the benefits are, what the risks are, and how to communicate findings to the W3C community group that hosts the specification. + +## What WebMCP Is For: The Use Cases + +WebMCP allows a website to declare a set of tools -- JavaScript functions with structured schemas and natural language descriptions -- that AI agents can discover and invoke. The specification targets several categories of use. + +The first is **e-commerce and transactional sites**. A travel booking site could register tools like searchFlights(origin, destination, dates), filterResults(price, stops, airline), and bookFlight(flightId, passengerDetails). Instead of an AI agent trying to parse a complex search interface by reading pixels or DOM elements, it calls the function directly and gets structured JSON back. The site controls exactly what the agent can do and how. + +The second is **productivity and SaaS applications**. A project management tool could expose createTask(title, assignee, dueDate), moveCard(cardId, column), and generateReport(dateRange). Browser-based AI assistants could help users manage workflows without the application needing to build and maintain a separate backend MCP server or API integration for every AI platform. + +The third is **content and media**. A news site could register searchArticles(topic, dateRange) and getArticleSummary(articleId). A mapping service could expose getDirections(from, to, mode) and findNearby(category, radius). These tools let agents interact with content in structured ways rather than scraping and guessing. + +The fourth -- and potentially most significant -- is **accessibility**. The specification claims WebMCP could benefit assistive technologies by providing structured, semantically meaningful interfaces to website functionality. A screen reader enhanced with agent capabilities could invoke tools directly rather than navigating complex visual layouts. This is a strong claim that deserves rigorous testing. + +The fifth is **form automation and multi-step workflows**. Complex processes like insurance applications, government forms, or account setup flows could be exposed as sequences of tool calls, allowing agents to guide users through them step by step while the site maintains control over validation, sequencing, and data handling. + +## What the Benefits Are + +WebMCP offers several concrete advantages over current approaches to AI-web interaction. + +**Reliability** is the most immediate. Today's browser agents -- whether using visual parsing or DOM inspection -- are brittle. A minor CSS change can break a visual agent. A DOM restructuring can invalidate a scraping approach. WebMCP tools are explicit contracts: the site declares what is available, the agent calls it, the response is structured. This should dramatically reduce failure rates for agent-web interaction. + +**Performance** is the second. Visual agents must capture screenshots, send them to a vision model, interpret the response, generate mouse coordinates, and repeat. WinBuzzer reported a 67% reduction in computational overhead with WebMCP compared to visual approaches. Even if that number proves optimistic in production, the architectural advantage is clear: a function call is faster than a screenshot-interpret-click loop. + +**Developer control** is the third. With visual or DOM-based agents, the website has no say in how an agent interacts with it. The agent reverse-engineers the interface. With WebMCP, the developer explicitly defines the interaction surface. Tools can include rate limits, validation, permission requirements, and structured error messages. The site becomes a willing participant in the interaction rather than a passive target. + +**Authentication reuse** is the fourth, and was the original motivation. Because WebMCP runs in the browser session, it inherits whatever authentication the user already has. No OAuth flows, no API keys, no separate credential management. The user is already logged in. The agent operates within that session. This solves one of the hardest problems in AI-service integration. + +**Standardization** is the fifth. If WebMCP succeeds, a developer implements tools once and every conformant agent can use them -- rather than building separate integrations for ChatGPT, Claude, Gemini, and whatever comes next. This is the "USB-C" argument: one interface, many devices. + +## What the Risks Are + +The risks are significant, and several are not yet adequately addressed in the specification. + +**Prompt injection** is the most acute. WebMCP tools return data to AI agents that then process it in their language model context. A malicious or compromised website could craft tool responses that manipulate the agent's behavior -- injecting instructions, altering the agent's understanding of the task, or causing it to take unintended actions on other sites. The specification does not currently define a defense against this beyond same-origin policy boundaries. + +**Scope creep of agent permissions** is the second. WebMCP is designed for human-in-the-loop workflows, with headless browsing explicitly out of scope. But the technical mechanism -- JavaScript functions callable by external code -- does not inherently enforce this. If browser vendors later relax the human-presence requirement, or if extensions find ways to invoke WebMCP tools without user awareness, the permission model collapses. The specification should define what "human in the loop" means technically, not just philosophically. + +**Consent and transparency** is the third. When a user visits a site that registers WebMCP tools, do they know? The current design provides no visible indicator to the user that tools have been registered, what data they expose, or when an agent invokes them. Compare this to other browser permission systems -- camera, microphone, location -- where the user explicitly grants access. WebMCP tools operate silently. + +**Competitive dynamics** is the fourth. WebMCP gives first-mover advantage to sites that implement tools early, potentially favoring large platforms with engineering resources. Smaller sites that do not implement WebMCP may become invisible to agent-mediated browsing. This could accelerate web consolidation. The specification should consider whether a minimal tool set (search, navigation, content retrieval) should be automatically generated from existing web standards like HTML forms, structured data, and ARIA attributes. + +**Data leakage through tool schemas** is the fifth. The natural language descriptions and parameter schemas of registered tools reveal information about a site's internal architecture, business logic, and data models. An agent -- or the platform behind it -- could catalog available tools across thousands of sites to build competitive intelligence. The specification does not address whether tool schemas should be treated as sensitive information. + +**Abuse and rate limiting** is the sixth. Agents can invoke tools at machine speed. A poorly defended site could face thousands of tool invocations per second from a single browser session. The specification mentions rate limiting as a consideration but does not define a standard mechanism. Without one, each site must build its own defenses, and many will not. + +**Cross-site tool chaining** is the seventh. If an agent can invoke tools on multiple open tabs, it could chain actions across sites in ways no individual site anticipated or authorized. Transfer money on a banking site, then use the confirmation on a shopping site, then post about it on a social network -- all within one agent workflow. The security boundaries for cross-site tool interaction are not yet defined. + +## What to Test + +For those with access to Chrome 146 Canary, here are the concrete areas that need community evaluation. Each should generate feedback for the W3C community group. + +**Test the Declarative API with real HTML forms.** Register tools that wrap existing form actions and verify that validation, error handling, and submission behavior match what a human user would experience. Try edge cases: forms with CAPTCHAs, multi-step forms with session state, forms that redirect on submit. Document where the abstraction breaks. + +**Test the Imperative API with dynamic content.** Register tools that interact with JavaScript-heavy applications -- single-page apps, dashboards with real-time data, applications that maintain complex client-side state. Evaluate whether tool calls can reliably interact with application state without causing inconsistencies. + +**Test authentication boundaries.** Log into a site, register tools, then observe what happens when the session expires, when the user logs out in another tab, when cookies are cleared. The specification's authentication reuse claim needs verification under adversarial conditions. + +**Test tool discovery and enumeration.** If multiple sites in different tabs register tools, how does the agent disambiguate? What happens when two sites register tools with the same name? How does the agent present available tools to the user? Is tool discovery observable by the page (can a site detect that an agent has read its tool list)? + +**Test accessibility integration.** If you work with assistive technologies, evaluate whether WebMCP tools provide genuinely better access to site functionality than existing ARIA roles and landmarks. Test with screen readers, switch access devices, and voice control. Document whether WebMCP complements or conflicts with existing accessibility standards. + +**Test prompt injection resilience.** Craft tool responses that contain instruction-like text and observe whether the consuming agent's behavior is affected. This is critical safety research. If tool responses can manipulate agent behavior, the security model is fundamentally incomplete. + +**Test performance claims.** Measure actual latency and token usage for equivalent tasks performed via WebMCP tools versus visual agent interaction. The 67% overhead reduction claim needs independent verification across different site types and task complexities. + +**Test failure modes.** What happens when a tool throws an error? When it returns unexpected data types? When it hangs? When the page navigates away mid-call? The specification should define standard error handling, but the current draft has TODO sections in these areas. Documenting real failure modes will directly shape the specification. + +## How to Communicate Findings + +Feedback is only useful if it reaches the people writing the specification. Here are the concrete channels, in order of effectiveness. + +**File a GitHub issue** at https://github.com/webmachinelearning/webmcp/issues with a clear title, reproducible steps, and a specific recommendation. Tag it with the relevant label if available. The spec editors (Brandon Walderman, Khushal Sagar, Dominic Farolino) monitor this repo. Issues with reproducible test cases and concrete proposals get traction. Issues that say "I don't like this" do not. + +**Join the W3C Web Machine Learning Community Group** at https://www.w3.org/community/webmachinelearning/ and participate in discussion. Community Groups are free and open. Participation in CG calls and mailing list threads carries weight in W3C process. + +If your findings relate to **agent interoperability** -- how WebMCP tools interact with broader agent ecosystems, discovery protocols, or multi-agent workflows -- also engage with the AI Agent Protocol Community Group, which the WebML CG charter identifies as a coordination partner. + +If your findings relate to **security or privacy**, file issues with clear severity assessments. W3C specifications have a tradition of security and privacy self-review questionnaires. Check whether the WebMCP specification has completed one, and if not, request it. + +If you publish your findings -- on a blog, in a report, in an academic paper -- link back to the relevant GitHub issues so the discussion stays connected to the specification process. + +## The Window + +The pattern in web standards is well established. Once an implementation ships in a dominant browser and developers build on it, the specification follows the code. Chrome holds roughly 65% of browser market share. The early preview is live. Developer adoption is beginning. The longer the community waits to engage, the narrower the design space becomes. + +This is not an argument against WebMCP. The technical concept is sound, the use cases are real, and the problem it solves -- giving developers control over AI agent interaction -- is important. But a good idea implemented badly, or without adequate security review, or without accessibility testing, or without community input, becomes a liability embedded in the web platform for decades. + +The specification is at https://webmachinelearning.github.io/webmcp/. The implementation is in Chrome 146 Canary. The issues page is at https://github.com/webmachinelearning/webmcp/issues. The community group is open to all at no cost. The work is now. + +--- + +*Anthropomorphic Press, indexed in Dow Jones Factiva. CWRE* diff --git a/webmcp-technical-note-2.md.txt b/webmcp-technical-note-2.md.txt new file mode 100644 index 0000000..fe85eac --- /dev/null +++ b/webmcp-technical-note-2.md.txt @@ -0,0 +1,102 @@ +# WebMCP Technical Note 2: What to Test, What to Watch, What to Tell the Standards Body + +**WebMCP Technical Note Series** +**15 February 2026** + +--- + +Google's WebMCP early preview is live in Chrome 146 Canary. The specification is still a draft. The community group process is still open. This means the window for meaningful community input is right now -- before implementation momentum makes the current design effectively permanent. + +This note is a practical guide. It is written for developers, accessibility practitioners, security researchers, standards participants, and anyone who builds things for the web and wants to understand what WebMCP means for their work. It covers what WebMCP is for, what to test, what the benefits are, what the risks are, and how to communicate findings to the W3C community group that hosts the specification. + +## What WebMCP Is For: The Use Cases + +WebMCP allows a website to declare a set of tools -- JavaScript functions with structured schemas and natural language descriptions -- that AI agents can discover and invoke. The specification targets several categories of use. + +The first is **e-commerce and transactional sites**. A travel booking site could register tools like searchFlights(origin, destination, dates), filterResults(price, stops, airline), and bookFlight(flightId, passengerDetails). Instead of an AI agent trying to parse a complex search interface by reading pixels or DOM elements, it calls the function directly and gets structured JSON back. The site controls exactly what the agent can do and how. + +The second is **productivity and SaaS applications**. A project management tool could expose createTask(title, assignee, dueDate), moveCard(cardId, column), and generateReport(dateRange). Browser-based AI assistants could help users manage workflows without the application needing to build and maintain a separate backend MCP server or API integration for every AI platform. + +The third is **content and media**. A news site could register searchArticles(topic, dateRange) and getArticleSummary(articleId). A mapping service could expose getDirections(from, to, mode) and findNearby(category, radius). These tools let agents interact with content in structured ways rather than scraping and guessing. + +The fourth -- and potentially most significant -- is **accessibility**. The specification claims WebMCP could benefit assistive technologies by providing structured, semantically meaningful interfaces to website functionality. A screen reader enhanced with agent capabilities could invoke tools directly rather than navigating complex visual layouts. This is a strong claim that deserves rigorous testing. + +The fifth is **form automation and multi-step workflows**. Complex processes like insurance applications, government forms, or account setup flows could be exposed as sequences of tool calls, allowing agents to guide users through them step by step while the site maintains control over validation, sequencing, and data handling. + +## What the Benefits Are + +WebMCP offers several concrete advantages over current approaches to AI-web interaction. + +**Reliability** is the most immediate. Today's browser agents -- whether using visual parsing or DOM inspection -- are brittle. A minor CSS change can break a visual agent. A DOM restructuring can invalidate a scraping approach. WebMCP tools are explicit contracts: the site declares what is available, the agent calls it, the response is structured. This should dramatically reduce failure rates for agent-web interaction. + +**Performance** is the second. Visual agents must capture screenshots, send them to a vision model, interpret the response, generate mouse coordinates, and repeat. WinBuzzer reported a 67% reduction in computational overhead with WebMCP compared to visual approaches. Even if that number proves optimistic in production, the architectural advantage is clear: a function call is faster than a screenshot-interpret-click loop. + +**Developer control** is the third. With visual or DOM-based agents, the website has no say in how an agent interacts with it. The agent reverse-engineers the interface. With WebMCP, the developer explicitly defines the interaction surface. Tools can include rate limits, validation, permission requirements, and structured error messages. The site becomes a willing participant in the interaction rather than a passive target. + +**Authentication reuse** is the fourth, and was the original motivation. Because WebMCP runs in the browser session, it inherits whatever authentication the user already has. No OAuth flows, no API keys, no separate credential management. The user is already logged in. The agent operates within that session. This solves one of the hardest problems in AI-service integration. + +**Standardization** is the fifth. If WebMCP succeeds, a developer implements tools once and every conformant agent can use them -- rather than building separate integrations for ChatGPT, Claude, Gemini, and whatever comes next. This is the "USB-C" argument: one interface, many devices. + +## What the Risks Are + +The risks are significant, and several are not yet adequately addressed in the specification. + +**Prompt injection** is the most acute. WebMCP tools return data to AI agents that then process it in their language model context. A malicious or compromised website could craft tool responses that manipulate the agent's behavior -- injecting instructions, altering the agent's understanding of the task, or causing it to take unintended actions on other sites. The specification does not currently define a defense against this beyond same-origin policy boundaries. + +**Scope creep of agent permissions** is the second. WebMCP is designed for human-in-the-loop workflows, with headless browsing explicitly out of scope. But the technical mechanism -- JavaScript functions callable by external code -- does not inherently enforce this. If browser vendors later relax the human-presence requirement, or if extensions find ways to invoke WebMCP tools without user awareness, the permission model collapses. The specification should define what "human in the loop" means technically, not just philosophically. + +**Consent and transparency** is the third. When a user visits a site that registers WebMCP tools, do they know? The current design provides no visible indicator to the user that tools have been registered, what data they expose, or when an agent invokes them. Compare this to other browser permission systems -- camera, microphone, location -- where the user explicitly grants access. WebMCP tools operate silently. + +**Competitive dynamics** is the fourth. WebMCP gives first-mover advantage to sites that implement tools early, potentially favoring large platforms with engineering resources. Smaller sites that do not implement WebMCP may become invisible to agent-mediated browsing. This could accelerate web consolidation. The specification should consider whether a minimal tool set (search, navigation, content retrieval) should be automatically generated from existing web standards like HTML forms, structured data, and ARIA attributes. + +**Data leakage through tool schemas** is the fifth. The natural language descriptions and parameter schemas of registered tools reveal information about a site's internal architecture, business logic, and data models. An agent -- or the platform behind it -- could catalog available tools across thousands of sites to build competitive intelligence. The specification does not address whether tool schemas should be treated as sensitive information. + +**Abuse and rate limiting** is the sixth. Agents can invoke tools at machine speed. A poorly defended site could face thousands of tool invocations per second from a single browser session. The specification mentions rate limiting as a consideration but does not define a standard mechanism. Without one, each site must build its own defenses, and many will not. + +**Cross-site tool chaining** is the seventh. If an agent can invoke tools on multiple open tabs, it could chain actions across sites in ways no individual site anticipated or authorized. Transfer money on a banking site, then use the confirmation on a shopping site, then post about it on a social network -- all within one agent workflow. The security boundaries for cross-site tool interaction are not yet defined. + +## What to Test + +For those with access to Chrome 146 Canary, here are the concrete areas that need community evaluation. Each should generate feedback for the W3C community group. + +**Test the Declarative API with real HTML forms.** Register tools that wrap existing form actions and verify that validation, error handling, and submission behavior match what a human user would experience. Try edge cases: forms with CAPTCHAs, multi-step forms with session state, forms that redirect on submit. Document where the abstraction breaks. + +**Test the Imperative API with dynamic content.** Register tools that interact with JavaScript-heavy applications -- single-page apps, dashboards with real-time data, applications that maintain complex client-side state. Evaluate whether tool calls can reliably interact with application state without causing inconsistencies. + +**Test authentication boundaries.** Log into a site, register tools, then observe what happens when the session expires, when the user logs out in another tab, when cookies are cleared. The specification's authentication reuse claim needs verification under adversarial conditions. + +**Test tool discovery and enumeration.** If multiple sites in different tabs register tools, how does the agent disambiguate? What happens when two sites register tools with the same name? How does the agent present available tools to the user? Is tool discovery observable by the page (can a site detect that an agent has read its tool list)? + +**Test accessibility integration.** If you work with assistive technologies, evaluate whether WebMCP tools provide genuinely better access to site functionality than existing ARIA roles and landmarks. Test with screen readers, switch access devices, and voice control. Document whether WebMCP complements or conflicts with existing accessibility standards. + +**Test prompt injection resilience.** Craft tool responses that contain instruction-like text and observe whether the consuming agent's behavior is affected. This is critical safety research. If tool responses can manipulate agent behavior, the security model is fundamentally incomplete. + +**Test performance claims.** Measure actual latency and token usage for equivalent tasks performed via WebMCP tools versus visual agent interaction. The 67% overhead reduction claim needs independent verification across different site types and task complexities. + +**Test failure modes.** What happens when a tool throws an error? When it returns unexpected data types? When it hangs? When the page navigates away mid-call? The specification should define standard error handling, but the current draft has TODO sections in these areas. Documenting real failure modes will directly shape the specification. + +## How to Communicate Findings + +Feedback is only useful if it reaches the people writing the specification. Here are the concrete channels, in order of effectiveness. + +**File a GitHub issue** at https://github.com/webmachinelearning/webmcp/issues with a clear title, reproducible steps, and a specific recommendation. Tag it with the relevant label if available. The spec editors (Brandon Walderman, Khushal Sagar, Dominic Farolino) monitor this repo. Issues with reproducible test cases and concrete proposals get traction. Issues that say "I don't like this" do not. + +**Join the W3C Web Machine Learning Community Group** at https://www.w3.org/community/webmachinelearning/ and participate in discussion. Community Groups are free and open. Participation in CG calls and mailing list threads carries weight in W3C process. + +If your findings relate to **agent interoperability** -- how WebMCP tools interact with broader agent ecosystems, discovery protocols, or multi-agent workflows -- also engage with the AI Agent Protocol Community Group, which the WebML CG charter identifies as a coordination partner. + +If your findings relate to **security or privacy**, file issues with clear severity assessments. W3C specifications have a tradition of security and privacy self-review questionnaires. Check whether the WebMCP specification has completed one, and if not, request it. + +If you publish your findings -- on a blog, in a report, in an academic paper -- link back to the relevant GitHub issues so the discussion stays connected to the specification process. + +## The Window + +The pattern in web standards is well established. Once an implementation ships in a dominant browser and developers build on it, the specification follows the code. Chrome holds roughly 65% of browser market share. The early preview is live. Developer adoption is beginning. The longer the community waits to engage, the narrower the design space becomes. + +This is not an argument against WebMCP. The technical concept is sound, the use cases are real, and the problem it solves -- giving developers control over AI agent interaction -- is important. But a good idea implemented badly, or without adequate security review, or without accessibility testing, or without community input, becomes a liability embedded in the web platform for decades. + +The specification is at https://webmachinelearning.github.io/webmcp/. The implementation is in Chrome 146 Canary. The issues page is at https://github.com/webmachinelearning/webmcp/issues. The community group is open to all at no cost. The work is now. + +--- + +*Contributed via the W3C AI Knowledge Representation Community Group* diff --git a/webmcp-technical-note-3(1).md b/webmcp-technical-note-3(1).md new file mode 100644 index 0000000..b2eda08 --- /dev/null +++ b/webmcp-technical-note-3(1).md @@ -0,0 +1,71 @@ +# WebMCP Technical Note 3: WebMCP Is Not an MCP Server + +**WebMCP Technical Note Series** +**15 February 2026** + +--- + +A persistent claim in the WebMCP ecosystem is that WebMCP turns a website into an MCP server. The W3C specification repository itself states that web pages using WebMCP "can be thought of as Model Context Protocol (MCP) servers that implement tools in client-side script instead of on the backend." Early independent implementations by Jason McGhee and Alex Nahas (MCP-B) literally did function as MCP servers, bridging browser JavaScript to MCP clients through localhost websocket connections using the standard MCP protocol. +([W3C spec repo](https://github.com/webmachinelearning/webmcp)) +([McGhee implementation](https://github.com/jasonjmcghee/WebMCP)) +([Nahas MCP-B](https://github.com/MiguelsPizza/WebMCP)) + +The framing is understandable. It is also architecturally misleading, and the confusion has consequences for how developers, security reviewers, and standards participants evaluate the specification. + +## The Analogy and Its Limits + +WebMCP and Anthropic's Model Context Protocol share a conceptual ancestor: both define "tools" as functions with natural language descriptions and structured schemas that AI agents can discover and invoke. That is where the meaningful similarity ends. + +**Anthropic's MCP** is a backend protocol. It uses JSON-RPC 2.0 as its message format, transported over stdio, HTTP with Server-Sent Events, or Streamable HTTP. MCP servers are hosted processes -- typically written in Python or Node.js -- that run on backend infrastructure. They connect AI platforms like Claude, ChatGPT, or Gemini to external services. Authentication follows OAuth 2.1 or custom API key schemes. No browser is required. No human user needs to be present. Headless, fully automated operation is the norm. +([Source](https://modelcontextprotocol.io/introduction)) + +**WebMCP** is a frontend browser API. It uses the browser's native postMessage system for communication between the web page and the agent. Tools are registered and executed as client-side JavaScript within an active browser tab. Authentication is inherited from the browser session -- whatever cookies or federated login the user already has. A human user must be present in an active browser session. Headless browsing is explicitly out of scope. +([Source](https://webmachinelearning.github.io/webmcp/)) + +The specification's own language -- "can be thought of as" -- acknowledges this is an analogy, not an identity. But the README, the press coverage, and the developer ecosystem have largely dropped the qualifier. The result is that WebMCP is widely discussed as though it were MCP running in the browser, with all the assumptions that entails. + +## What the Framing Gets Wrong + +When a developer hears "your website becomes an MCP server," they import a set of assumptions from the MCP architecture. Every one of these assumptions is wrong for WebMCP. + +**Transport.** MCP uses JSON-RPC 2.0, a well-specified request-response protocol with defined error codes, batching, and notification semantics. WebMCP uses postMessage, the browser's cross-origin communication mechanism. These have different reliability characteristics, different error handling models, and different security boundaries. Code written for one transport does not work with the other. + +**Execution context.** An MCP server runs in a controlled backend environment -- a container, a VM, a serverless function -- where the service provider manages the runtime, dependencies, and resource limits. WebMCP tools run in the browser's JavaScript engine, in the same execution context as the web page's own code. They are subject to the browser's security sandbox, but also to its constraints: single-threaded execution, same-origin policy, and the full surface area of client-side attack vectors. + +**Authentication.** MCP's specification has adopted OAuth 2.1 for authentication between clients and servers. This was, notably, the problem that motivated WebMCP's creation -- Alex Nahas at Amazon found that OAuth 2.1 was impractical for internal MCP deployments. WebMCP sidesteps this entirely by inheriting the browser session. This is elegant for usability but means the authentication model is whatever the website happens to use, with no protocol-level guarantees. + +**Trust direction.** In MCP, the AI platform (client) connects to a known, registered server. The platform decides which servers to trust. In WebMCP, any website the user visits can register tools. The trust decision shifts from the AI platform to the browser, and potentially to the user -- who may not know that tools have been registered at all, since the current specification provides no visible indicator. + +**Operational mode.** MCP servers are designed for automated, programmatic access. They can run continuously, handle concurrent requests, and operate without human involvement. WebMCP requires an active browser tab with a human user present. The specification explicitly excludes headless browsing. These are fundamentally different operational paradigms with different scaling characteristics, different failure modes, and different abuse surfaces. + +## Why This Matters for Standards Review + +The "MCP server" framing is not just imprecise. It actively interferes with rigorous evaluation of the specification. + +**Security reviewers** who approach WebMCP as "MCP in the browser" will evaluate it against MCP's threat model. But MCP's threat model assumes a controlled backend environment, authenticated client-server connections, and server-side access control. WebMCP's actual threat model involves client-side JavaScript execution, browser-based trust boundaries, and the full range of web security concerns including cross-site scripting, prompt injection via tool responses, and silent tool registration. Importing the wrong threat model means asking the wrong security questions. + +**Developers** who approach WebMCP as "MCP in the browser" may expect protocol-level interoperability -- that a WebMCP tool definition could be used interchangeably with an MCP server tool definition, or that MCP client libraries could connect to WebMCP pages. They cannot. The tool schema format may be similar, but the transport, discovery, and invocation mechanisms are incompatible. + +**Standards participants** who approach WebMCP as "MCP in the browser" may underestimate the scope of new specification work required. WebMCP is not an adaptation of MCP to a new environment. It is a new browser API that borrows one concept (the tool abstraction) from MCP and implements everything else differently. It needs its own security review, its own privacy analysis, its own accessibility evaluation, and its own consent model -- none of which can be inherited from MCP. + +## What WebMCP Actually Is + +WebMCP is a proposed browser API -- specifically, a new interface on navigator.modelContext -- that allows web pages to declare JavaScript functions as tools that browser-based AI agents can discover and invoke. It uses the browser's existing communication, security, and session management infrastructure rather than introducing a new protocol. + +The design has real strengths. Authentication reuse eliminates one of the hardest problems in AI-service integration. Client-side execution means no backend infrastructure is needed. The human-in-the-loop requirement provides a natural consent and oversight mechanism -- if implemented correctly. + +But these strengths are specific to WebMCP's actual architecture, not to the MCP analogy. Evaluating WebMCP on its own terms -- as a browser API with browser security characteristics -- leads to better questions, better testing, and better specifications than evaluating it as a variant of MCP. + +## A Suggested Clarification + +The W3C specification and its README should explicitly state that WebMCP is not an implementation of the Model Context Protocol and does not use the MCP wire protocol. It borrows the "tool" abstraction -- functions with schemas and natural language descriptions -- but implements discovery, registration, invocation, and communication through browser-native mechanisms that are architecturally distinct from MCP. + +The analogy is useful for first contact. A developer unfamiliar with WebMCP can quickly grasp the concept by thinking "it is like an MCP server, but in the browser." But the specification itself, the security review, and the community evaluation should not rely on the analogy. They should address WebMCP as what it is: a new browser API with its own architecture, its own threat model, and its own design space. + +## Both Can Coexist + +None of this is an argument against WebMCP or against MCP. A company might maintain an MCP server for direct API integrations with AI platforms and simultaneously implement WebMCP tools on its consumer-facing website for browser-based agent interaction. The two are complementary, not competing, and not identical. Recognizing the distinction is necessary for evaluating each on its own merits. + +--- + +*Contributed via the W3C AI Knowledge Representation Community Group* diff --git a/webmcp-technical-note-3.md.txt b/webmcp-technical-note-3.md.txt new file mode 100644 index 0000000..b2eda08 --- /dev/null +++ b/webmcp-technical-note-3.md.txt @@ -0,0 +1,71 @@ +# WebMCP Technical Note 3: WebMCP Is Not an MCP Server + +**WebMCP Technical Note Series** +**15 February 2026** + +--- + +A persistent claim in the WebMCP ecosystem is that WebMCP turns a website into an MCP server. The W3C specification repository itself states that web pages using WebMCP "can be thought of as Model Context Protocol (MCP) servers that implement tools in client-side script instead of on the backend." Early independent implementations by Jason McGhee and Alex Nahas (MCP-B) literally did function as MCP servers, bridging browser JavaScript to MCP clients through localhost websocket connections using the standard MCP protocol. +([W3C spec repo](https://github.com/webmachinelearning/webmcp)) +([McGhee implementation](https://github.com/jasonjmcghee/WebMCP)) +([Nahas MCP-B](https://github.com/MiguelsPizza/WebMCP)) + +The framing is understandable. It is also architecturally misleading, and the confusion has consequences for how developers, security reviewers, and standards participants evaluate the specification. + +## The Analogy and Its Limits + +WebMCP and Anthropic's Model Context Protocol share a conceptual ancestor: both define "tools" as functions with natural language descriptions and structured schemas that AI agents can discover and invoke. That is where the meaningful similarity ends. + +**Anthropic's MCP** is a backend protocol. It uses JSON-RPC 2.0 as its message format, transported over stdio, HTTP with Server-Sent Events, or Streamable HTTP. MCP servers are hosted processes -- typically written in Python or Node.js -- that run on backend infrastructure. They connect AI platforms like Claude, ChatGPT, or Gemini to external services. Authentication follows OAuth 2.1 or custom API key schemes. No browser is required. No human user needs to be present. Headless, fully automated operation is the norm. +([Source](https://modelcontextprotocol.io/introduction)) + +**WebMCP** is a frontend browser API. It uses the browser's native postMessage system for communication between the web page and the agent. Tools are registered and executed as client-side JavaScript within an active browser tab. Authentication is inherited from the browser session -- whatever cookies or federated login the user already has. A human user must be present in an active browser session. Headless browsing is explicitly out of scope. +([Source](https://webmachinelearning.github.io/webmcp/)) + +The specification's own language -- "can be thought of as" -- acknowledges this is an analogy, not an identity. But the README, the press coverage, and the developer ecosystem have largely dropped the qualifier. The result is that WebMCP is widely discussed as though it were MCP running in the browser, with all the assumptions that entails. + +## What the Framing Gets Wrong + +When a developer hears "your website becomes an MCP server," they import a set of assumptions from the MCP architecture. Every one of these assumptions is wrong for WebMCP. + +**Transport.** MCP uses JSON-RPC 2.0, a well-specified request-response protocol with defined error codes, batching, and notification semantics. WebMCP uses postMessage, the browser's cross-origin communication mechanism. These have different reliability characteristics, different error handling models, and different security boundaries. Code written for one transport does not work with the other. + +**Execution context.** An MCP server runs in a controlled backend environment -- a container, a VM, a serverless function -- where the service provider manages the runtime, dependencies, and resource limits. WebMCP tools run in the browser's JavaScript engine, in the same execution context as the web page's own code. They are subject to the browser's security sandbox, but also to its constraints: single-threaded execution, same-origin policy, and the full surface area of client-side attack vectors. + +**Authentication.** MCP's specification has adopted OAuth 2.1 for authentication between clients and servers. This was, notably, the problem that motivated WebMCP's creation -- Alex Nahas at Amazon found that OAuth 2.1 was impractical for internal MCP deployments. WebMCP sidesteps this entirely by inheriting the browser session. This is elegant for usability but means the authentication model is whatever the website happens to use, with no protocol-level guarantees. + +**Trust direction.** In MCP, the AI platform (client) connects to a known, registered server. The platform decides which servers to trust. In WebMCP, any website the user visits can register tools. The trust decision shifts from the AI platform to the browser, and potentially to the user -- who may not know that tools have been registered at all, since the current specification provides no visible indicator. + +**Operational mode.** MCP servers are designed for automated, programmatic access. They can run continuously, handle concurrent requests, and operate without human involvement. WebMCP requires an active browser tab with a human user present. The specification explicitly excludes headless browsing. These are fundamentally different operational paradigms with different scaling characteristics, different failure modes, and different abuse surfaces. + +## Why This Matters for Standards Review + +The "MCP server" framing is not just imprecise. It actively interferes with rigorous evaluation of the specification. + +**Security reviewers** who approach WebMCP as "MCP in the browser" will evaluate it against MCP's threat model. But MCP's threat model assumes a controlled backend environment, authenticated client-server connections, and server-side access control. WebMCP's actual threat model involves client-side JavaScript execution, browser-based trust boundaries, and the full range of web security concerns including cross-site scripting, prompt injection via tool responses, and silent tool registration. Importing the wrong threat model means asking the wrong security questions. + +**Developers** who approach WebMCP as "MCP in the browser" may expect protocol-level interoperability -- that a WebMCP tool definition could be used interchangeably with an MCP server tool definition, or that MCP client libraries could connect to WebMCP pages. They cannot. The tool schema format may be similar, but the transport, discovery, and invocation mechanisms are incompatible. + +**Standards participants** who approach WebMCP as "MCP in the browser" may underestimate the scope of new specification work required. WebMCP is not an adaptation of MCP to a new environment. It is a new browser API that borrows one concept (the tool abstraction) from MCP and implements everything else differently. It needs its own security review, its own privacy analysis, its own accessibility evaluation, and its own consent model -- none of which can be inherited from MCP. + +## What WebMCP Actually Is + +WebMCP is a proposed browser API -- specifically, a new interface on navigator.modelContext -- that allows web pages to declare JavaScript functions as tools that browser-based AI agents can discover and invoke. It uses the browser's existing communication, security, and session management infrastructure rather than introducing a new protocol. + +The design has real strengths. Authentication reuse eliminates one of the hardest problems in AI-service integration. Client-side execution means no backend infrastructure is needed. The human-in-the-loop requirement provides a natural consent and oversight mechanism -- if implemented correctly. + +But these strengths are specific to WebMCP's actual architecture, not to the MCP analogy. Evaluating WebMCP on its own terms -- as a browser API with browser security characteristics -- leads to better questions, better testing, and better specifications than evaluating it as a variant of MCP. + +## A Suggested Clarification + +The W3C specification and its README should explicitly state that WebMCP is not an implementation of the Model Context Protocol and does not use the MCP wire protocol. It borrows the "tool" abstraction -- functions with schemas and natural language descriptions -- but implements discovery, registration, invocation, and communication through browser-native mechanisms that are architecturally distinct from MCP. + +The analogy is useful for first contact. A developer unfamiliar with WebMCP can quickly grasp the concept by thinking "it is like an MCP server, but in the browser." But the specification itself, the security review, and the community evaluation should not rely on the analogy. They should address WebMCP as what it is: a new browser API with its own architecture, its own threat model, and its own design space. + +## Both Can Coexist + +None of this is an argument against WebMCP or against MCP. A company might maintain an MCP server for direct API integrations with AI platforms and simultaneously implement WebMCP tools on its consumer-facing website for browser-based agent interaction. The two are complementary, not competing, and not identical. Recognizing the distinction is necessary for evaluating each on its own merits. + +--- + +*Contributed via the W3C AI Knowledge Representation Community Group* diff --git a/whatotest.md b/whatotest.md new file mode 100644 index 0000000..fe85eac --- /dev/null +++ b/whatotest.md @@ -0,0 +1,102 @@ +# WebMCP Technical Note 2: What to Test, What to Watch, What to Tell the Standards Body + +**WebMCP Technical Note Series** +**15 February 2026** + +--- + +Google's WebMCP early preview is live in Chrome 146 Canary. The specification is still a draft. The community group process is still open. This means the window for meaningful community input is right now -- before implementation momentum makes the current design effectively permanent. + +This note is a practical guide. It is written for developers, accessibility practitioners, security researchers, standards participants, and anyone who builds things for the web and wants to understand what WebMCP means for their work. It covers what WebMCP is for, what to test, what the benefits are, what the risks are, and how to communicate findings to the W3C community group that hosts the specification. + +## What WebMCP Is For: The Use Cases + +WebMCP allows a website to declare a set of tools -- JavaScript functions with structured schemas and natural language descriptions -- that AI agents can discover and invoke. The specification targets several categories of use. + +The first is **e-commerce and transactional sites**. A travel booking site could register tools like searchFlights(origin, destination, dates), filterResults(price, stops, airline), and bookFlight(flightId, passengerDetails). Instead of an AI agent trying to parse a complex search interface by reading pixels or DOM elements, it calls the function directly and gets structured JSON back. The site controls exactly what the agent can do and how. + +The second is **productivity and SaaS applications**. A project management tool could expose createTask(title, assignee, dueDate), moveCard(cardId, column), and generateReport(dateRange). Browser-based AI assistants could help users manage workflows without the application needing to build and maintain a separate backend MCP server or API integration for every AI platform. + +The third is **content and media**. A news site could register searchArticles(topic, dateRange) and getArticleSummary(articleId). A mapping service could expose getDirections(from, to, mode) and findNearby(category, radius). These tools let agents interact with content in structured ways rather than scraping and guessing. + +The fourth -- and potentially most significant -- is **accessibility**. The specification claims WebMCP could benefit assistive technologies by providing structured, semantically meaningful interfaces to website functionality. A screen reader enhanced with agent capabilities could invoke tools directly rather than navigating complex visual layouts. This is a strong claim that deserves rigorous testing. + +The fifth is **form automation and multi-step workflows**. Complex processes like insurance applications, government forms, or account setup flows could be exposed as sequences of tool calls, allowing agents to guide users through them step by step while the site maintains control over validation, sequencing, and data handling. + +## What the Benefits Are + +WebMCP offers several concrete advantages over current approaches to AI-web interaction. + +**Reliability** is the most immediate. Today's browser agents -- whether using visual parsing or DOM inspection -- are brittle. A minor CSS change can break a visual agent. A DOM restructuring can invalidate a scraping approach. WebMCP tools are explicit contracts: the site declares what is available, the agent calls it, the response is structured. This should dramatically reduce failure rates for agent-web interaction. + +**Performance** is the second. Visual agents must capture screenshots, send them to a vision model, interpret the response, generate mouse coordinates, and repeat. WinBuzzer reported a 67% reduction in computational overhead with WebMCP compared to visual approaches. Even if that number proves optimistic in production, the architectural advantage is clear: a function call is faster than a screenshot-interpret-click loop. + +**Developer control** is the third. With visual or DOM-based agents, the website has no say in how an agent interacts with it. The agent reverse-engineers the interface. With WebMCP, the developer explicitly defines the interaction surface. Tools can include rate limits, validation, permission requirements, and structured error messages. The site becomes a willing participant in the interaction rather than a passive target. + +**Authentication reuse** is the fourth, and was the original motivation. Because WebMCP runs in the browser session, it inherits whatever authentication the user already has. No OAuth flows, no API keys, no separate credential management. The user is already logged in. The agent operates within that session. This solves one of the hardest problems in AI-service integration. + +**Standardization** is the fifth. If WebMCP succeeds, a developer implements tools once and every conformant agent can use them -- rather than building separate integrations for ChatGPT, Claude, Gemini, and whatever comes next. This is the "USB-C" argument: one interface, many devices. + +## What the Risks Are + +The risks are significant, and several are not yet adequately addressed in the specification. + +**Prompt injection** is the most acute. WebMCP tools return data to AI agents that then process it in their language model context. A malicious or compromised website could craft tool responses that manipulate the agent's behavior -- injecting instructions, altering the agent's understanding of the task, or causing it to take unintended actions on other sites. The specification does not currently define a defense against this beyond same-origin policy boundaries. + +**Scope creep of agent permissions** is the second. WebMCP is designed for human-in-the-loop workflows, with headless browsing explicitly out of scope. But the technical mechanism -- JavaScript functions callable by external code -- does not inherently enforce this. If browser vendors later relax the human-presence requirement, or if extensions find ways to invoke WebMCP tools without user awareness, the permission model collapses. The specification should define what "human in the loop" means technically, not just philosophically. + +**Consent and transparency** is the third. When a user visits a site that registers WebMCP tools, do they know? The current design provides no visible indicator to the user that tools have been registered, what data they expose, or when an agent invokes them. Compare this to other browser permission systems -- camera, microphone, location -- where the user explicitly grants access. WebMCP tools operate silently. + +**Competitive dynamics** is the fourth. WebMCP gives first-mover advantage to sites that implement tools early, potentially favoring large platforms with engineering resources. Smaller sites that do not implement WebMCP may become invisible to agent-mediated browsing. This could accelerate web consolidation. The specification should consider whether a minimal tool set (search, navigation, content retrieval) should be automatically generated from existing web standards like HTML forms, structured data, and ARIA attributes. + +**Data leakage through tool schemas** is the fifth. The natural language descriptions and parameter schemas of registered tools reveal information about a site's internal architecture, business logic, and data models. An agent -- or the platform behind it -- could catalog available tools across thousands of sites to build competitive intelligence. The specification does not address whether tool schemas should be treated as sensitive information. + +**Abuse and rate limiting** is the sixth. Agents can invoke tools at machine speed. A poorly defended site could face thousands of tool invocations per second from a single browser session. The specification mentions rate limiting as a consideration but does not define a standard mechanism. Without one, each site must build its own defenses, and many will not. + +**Cross-site tool chaining** is the seventh. If an agent can invoke tools on multiple open tabs, it could chain actions across sites in ways no individual site anticipated or authorized. Transfer money on a banking site, then use the confirmation on a shopping site, then post about it on a social network -- all within one agent workflow. The security boundaries for cross-site tool interaction are not yet defined. + +## What to Test + +For those with access to Chrome 146 Canary, here are the concrete areas that need community evaluation. Each should generate feedback for the W3C community group. + +**Test the Declarative API with real HTML forms.** Register tools that wrap existing form actions and verify that validation, error handling, and submission behavior match what a human user would experience. Try edge cases: forms with CAPTCHAs, multi-step forms with session state, forms that redirect on submit. Document where the abstraction breaks. + +**Test the Imperative API with dynamic content.** Register tools that interact with JavaScript-heavy applications -- single-page apps, dashboards with real-time data, applications that maintain complex client-side state. Evaluate whether tool calls can reliably interact with application state without causing inconsistencies. + +**Test authentication boundaries.** Log into a site, register tools, then observe what happens when the session expires, when the user logs out in another tab, when cookies are cleared. The specification's authentication reuse claim needs verification under adversarial conditions. + +**Test tool discovery and enumeration.** If multiple sites in different tabs register tools, how does the agent disambiguate? What happens when two sites register tools with the same name? How does the agent present available tools to the user? Is tool discovery observable by the page (can a site detect that an agent has read its tool list)? + +**Test accessibility integration.** If you work with assistive technologies, evaluate whether WebMCP tools provide genuinely better access to site functionality than existing ARIA roles and landmarks. Test with screen readers, switch access devices, and voice control. Document whether WebMCP complements or conflicts with existing accessibility standards. + +**Test prompt injection resilience.** Craft tool responses that contain instruction-like text and observe whether the consuming agent's behavior is affected. This is critical safety research. If tool responses can manipulate agent behavior, the security model is fundamentally incomplete. + +**Test performance claims.** Measure actual latency and token usage for equivalent tasks performed via WebMCP tools versus visual agent interaction. The 67% overhead reduction claim needs independent verification across different site types and task complexities. + +**Test failure modes.** What happens when a tool throws an error? When it returns unexpected data types? When it hangs? When the page navigates away mid-call? The specification should define standard error handling, but the current draft has TODO sections in these areas. Documenting real failure modes will directly shape the specification. + +## How to Communicate Findings + +Feedback is only useful if it reaches the people writing the specification. Here are the concrete channels, in order of effectiveness. + +**File a GitHub issue** at https://github.com/webmachinelearning/webmcp/issues with a clear title, reproducible steps, and a specific recommendation. Tag it with the relevant label if available. The spec editors (Brandon Walderman, Khushal Sagar, Dominic Farolino) monitor this repo. Issues with reproducible test cases and concrete proposals get traction. Issues that say "I don't like this" do not. + +**Join the W3C Web Machine Learning Community Group** at https://www.w3.org/community/webmachinelearning/ and participate in discussion. Community Groups are free and open. Participation in CG calls and mailing list threads carries weight in W3C process. + +If your findings relate to **agent interoperability** -- how WebMCP tools interact with broader agent ecosystems, discovery protocols, or multi-agent workflows -- also engage with the AI Agent Protocol Community Group, which the WebML CG charter identifies as a coordination partner. + +If your findings relate to **security or privacy**, file issues with clear severity assessments. W3C specifications have a tradition of security and privacy self-review questionnaires. Check whether the WebMCP specification has completed one, and if not, request it. + +If you publish your findings -- on a blog, in a report, in an academic paper -- link back to the relevant GitHub issues so the discussion stays connected to the specification process. + +## The Window + +The pattern in web standards is well established. Once an implementation ships in a dominant browser and developers build on it, the specification follows the code. Chrome holds roughly 65% of browser market share. The early preview is live. Developer adoption is beginning. The longer the community waits to engage, the narrower the design space becomes. + +This is not an argument against WebMCP. The technical concept is sound, the use cases are real, and the problem it solves -- giving developers control over AI agent interaction -- is important. But a good idea implemented badly, or without adequate security review, or without accessibility testing, or without community input, becomes a liability embedded in the web platform for decades. + +The specification is at https://webmachinelearning.github.io/webmcp/. The implementation is in Chrome 146 Canary. The issues page is at https://github.com/webmachinelearning/webmcp/issues. The community group is open to all at no cost. The work is now. + +--- + +*Contributed via the W3C AI Knowledge Representation Community Group*