Skip to content

Comments

Fix base64 deserialization when JSON encoder escapes forward slashes#1342

Open
Copilot wants to merge 9 commits intomainfrom
copilot/fix-imagecontentblock-base64-error
Open

Fix base64 deserialization when JSON encoder escapes forward slashes#1342
Copilot wants to merge 9 commits intomainfrom
copilot/fix-imagecontentblock-base64-error

Conversation

Copy link
Contributor

Copilot AI commented Feb 21, 2026

  • Fix ContentBlock.Converter.Read and ResourceContents.Converter.Read to handle escaped JSON base64
  • Lazy base64 encoding in FromBytes factory methods
  • Comprehensive base64 roundtripping tests for all protocol types
  • AIContent roundtrip tests for base64 types (consolidated per review feedback)
  • Deduplicate Base64TestData() method across test files
Original prompt

This section details on the original issue you should resolve

<issue_title>0.9.0-preview.1: ImageContentBlock throws "invalid base64" when consuming CallToolResult (works in 0.8.0-preview.1)</issue_title>
<issue_description>

Title

0.9.0-preview.1: ImageContentBlock throws "invalid base64" when consuming CallToolResult (works in 0.8.0-preview.1)


Description

After upgrading from ModelContextProtocol 0.8.0-preview.1 to 0.9.0-preview.1, consuming an MCP server CallToolResult that includes an ImageContentBlock now throws an "invalid base64" exception.

The exact same MCP server response works correctly in:

  • 0.8.0-preview.1
  • Postman (image renders correctly)

This appears to be a regression introduced in 0.9.0-preview.1, possibly related to the change of ImageContentBlock.Data from string to ReadOnlyMemory<byte> and how JSON deserialization handles base64 content.


Expected Behavior

CallToolResult containing an ImageContentBlock should deserialize successfully and allow access to the image data (via DecodedData or Data) without throwing.

This worked correctly in 0.8.0-preview.1.


Actual Behavior

When consuming the tool result:

  • Accessing ImageContentBlock causes an "invalid base64" exception.
  • In some cases, accessing DecodedData triggers the exception.
  • The same payload renders correctly in Postman.
  • The server response contains valid base64 (confirmed visually and by rendering in Postman).

Repro Steps

  1. Use ModelContextProtocol 0.9.0-preview.1
  2. Connect to MCP server
  3. Invoke a tool that returns an ImageContentBlock
  4. Receive CallToolResult
  5. Access image content (e.g., DecodedData)

Result: FormatException / invalid base64 error

Reverting to 0.8.0-preview.1 resolves the issue immediately.


Sample Tool Result Payload (Redacted)

{
  "content": [
    {
      "type": "image",
      "mimeType": "image/png",
      "data": "<valid base64 string>"
    }
  ]
}

Notes:

  • No data:image/... prefix
  • Standard base64 alphabet
  • Image renders correctly in Postman

Environment

  • ModelContextProtocol: 0.9.0-preview.1
  • .NET: (e.g. .NET 10)
  • OS: (Windows)

Additional Context

This appears to be related to the 0.9 change where ImageContentBlock.Data is now ReadOnlyMemory<byte>. It may be caused by stricter base64 handling during JSON deserialization or a double-decode scenario.


</issue_description>

<agent_instructions>Fix this issue and ensure there are appropriate tests. One theory about the root cause and fix is in the benchcopilot/reproduce-issue-1340</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…k, AudioContentBlock, and BlobResourceContents

The ContentBlock.Converter and ResourceContents.Converter were using
reader.ValueSpan.ToArray() to read base64-encoded data from JSON strings.
This reads raw bytes without unescaping JSON escape sequences. When base64
data contains '/' characters that are JSON-escaped as '\/' (a valid JSON
escape used by some encoders), the backslash corrupts the base64 data,
causing "Invalid base64 data" FormatException on access to DecodedData.

Fix: Check reader.ValueIsEscaped and fall back to GetString() + UTF8
encoding when escape sequences are present, preserving the fast path
for the common unescaped case.

Fixes #1340

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@stephentoub stephentoub marked this pull request as ready for review February 21, 2026 05:33
Copilot AI changed the title [WIP] Fix invalid base64 error in ImageContentBlock Fix base64 deserialization when JSON encoder escapes forward slashes Feb 21, 2026
@stephentoub stephentoub added this to the 1.0.0 (GA) milestone Feb 22, 2026
When JSON-escaped base64 data is present, use reader.GetBytesFromBase64()
to directly decode to bytes instead of going through an intermediate
string with GetString() + Encoding.UTF8.GetBytes(). Then use FromBytes()
factory methods which cache the decoded data and lazily encode to base64
UTF-8, avoiding unnecessary intermediate allocations.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@ericstj
Copy link
Collaborator

ericstj commented Feb 23, 2026

I wonder if we need anything special for MEAI's content types. I was researching that base64 URI avoids using slash / and + by replacing with other characters (_ and -). I wonder if our Base64 decoding in framework works with this. We should probably add tests for these cases. Where we use DataContent.Base64Data.

@stephentoub
Copy link
Contributor

stephentoub commented Feb 23, 2026

I wonder if we need anything special for MEAI's content types. I was researching that base64 URI avoids using slash / and + by replacing with other characters (_ and -). I wonder if our Base64 decoding in framework works with this. We should probably add tests for these cases. Where we use DataContent.Base64Data.

Base64 and Base64Uri are two separate encoding schemas (same algorithm, different alphabets); they're not interchangeable.

FromBytes now stores only the decoded bytes via a private constructor
with [SetsRequiredMembers]. The Data/Blob property getter lazily encodes
to base64 UTF-8 on first access. This avoids eager encoding when only
DecodedData is consumed.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@ericstj
Copy link
Collaborator

ericstj commented Feb 23, 2026

I see - that explains why MEAI just goes through Convert and not Base64Url. https://github.com/dotnet/extensions/blob/4939788c2c8fd6cd7af31900641f36a60318966c/src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/DataUriParser.cs#L204

So I guess that means we're OK to directly consume the Base64Data property as "canonical" base64 UTF-8 bytes without any special handling?

DataContent dataContent when dataContent.HasTopLevelMediaType("image") => new ImageContentBlock
{
Data = EncodingUtilities.GetUtf8Bytes(dataContent.Base64Data.Span),
MimeType = dataContent.MediaType,
},
DataContent dataContent when dataContent.HasTopLevelMediaType("audio") => new AudioContentBlock
{
Data = EncodingUtilities.GetUtf8Bytes(dataContent.Base64Data.Span),
MimeType = dataContent.MediaType,
},

We should still add tests for all this. Filing an issue.

ericstj
ericstj previously approved these changes Feb 23, 2026
Copy link
Collaborator

@ericstj ericstj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

@stephentoub
Copy link
Contributor

@copilot please add comprehensive roundtripping tests for all of the protocol types that work in terms of base64. Validate various lengths and inputs that will exercise the full base64 alphabet, with and without padding, and ensure that all the various ways these types can be constructed properly expose the encoded/decoded data. Add the new tests to existing test files wherever possible.

Add Theory-based tests for ImageContentBlock, AudioContentBlock, and
BlobResourceContents covering:
- Various lengths (empty, 1-4 bytes, 256 bytes, 1024 bytes)
- Full base64 alphabet including '+' and '/' characters
- With/without padding (0, 1, 2 padding chars)
- All construction paths: FromBytes(), Data/Blob setter, JSON deserialization
- Escaped and unescaped JSON deserialization
- Lazy encoding verification (FromBytes defers until Data/Blob accessed)
- Cache invalidation when Data/Blob setter is used after FromBytes

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 23, 2026

@copilot please add comprehensive roundtripping tests for all of the protocol types that work in terms of base64. Validate various lengths and inputs that will exercise the full base64 alphabet, with ...

Added in cb078ad. Comprehensive [Theory]-based roundtripping tests for ImageContentBlock, AudioContentBlock, and BlobResourceContents with test data covering:

  • Various lengths: empty, 1 byte (2 padding), 2 bytes (1 padding), 3 bytes (no padding), 4 bytes, 256 bytes (all byte values → full base64 alphabet), 1024 bytes
  • All base64 alphabet characters including + and /
  • All construction paths: FromBytes(), Data/Blob property setter, JSON deserialization (escaped and unescaped)
  • Lazy encoding verification (FromBytesDecodedData first, then Data)
  • Cache invalidation when Data/Blob setter is used after FromBytes

…ypes

Add Theory-based tests validating conversions to/from AIContent for
ImageContentBlock, AudioContentBlock, and BlobResourceContents:
- ToAIContent preserves decoded data and media type
- Full roundtrip through ToAIContent → ToContentBlock
- Both FromBytes() and Data/Blob setter construction paths
- DataContent → ContentBlock direction (image/audio/blob)
- JSON deserialization (both escaped and unescaped) through AIContent roundtrips

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
… duplicative ones

- Merge *_ToAIContent_PreservesData and *_ToAIContent_RoundTrips into single tests
  that validate both the intermediate DataContent and the roundtripped block
- Remove duplicative DataContent_*ToContentBlock_PreservesData tests (already covered
  by the roundtrip tests)
- Add missing MediaType assertions on intermediate DataContent for *_DataSetter and
  *_BlobSetter test variants

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
…a from ResourceContentsTests and AIContentExtensionsTests

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0.9.0-preview.1: ImageContentBlock throws "invalid base64" when consuming CallToolResult (works in 0.8.0-preview.1)

3 participants