Fix base64 deserialization when JSON encoder escapes forward slashes#1342
Fix base64 deserialization when JSON encoder escapes forward slashes#1342
Conversation
…k, AudioContentBlock, and BlobResourceContents The ContentBlock.Converter and ResourceContents.Converter were using reader.ValueSpan.ToArray() to read base64-encoded data from JSON strings. This reads raw bytes without unescaping JSON escape sequences. When base64 data contains '/' characters that are JSON-escaped as '\/' (a valid JSON escape used by some encoders), the backslash corrupts the base64 data, causing "Invalid base64 data" FormatException on access to DecodedData. Fix: Check reader.ValueIsEscaped and fall back to GetString() + UTF8 encoding when escape sequences are present, preserving the fast path for the common unescaped case. Fixes #1340 Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
When JSON-escaped base64 data is present, use reader.GetBytesFromBase64() to directly decode to bytes instead of going through an intermediate string with GetString() + Encoding.UTF8.GetBytes(). Then use FromBytes() factory methods which cache the decoded data and lazily encode to base64 UTF-8, avoiding unnecessary intermediate allocations. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
|
I wonder if we need anything special for MEAI's content types. I was researching that base64 URI avoids using slash |
Base64 and Base64Uri are two separate encoding schemas (same algorithm, different alphabets); they're not interchangeable. |
FromBytes now stores only the decoded bytes via a private constructor with [SetsRequiredMembers]. The Data/Blob property getter lazily encodes to base64 UTF-8 on first access. This avoids eager encoding when only DecodedData is consumed. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
|
I see - that explains why MEAI just goes through So I guess that means we're OK to directly consume the Base64Data property as "canonical" base64 UTF-8 bytes without any special handling? csharp-sdk/src/ModelContextProtocol.Core/AIContentExtensions.cs Lines 400 to 410 in fa56de9 We should still add tests for all this. Filing an issue. |
|
@copilot please add comprehensive roundtripping tests for all of the protocol types that work in terms of base64. Validate various lengths and inputs that will exercise the full base64 alphabet, with and without padding, and ensure that all the various ways these types can be constructed properly expose the encoded/decoded data. Add the new tests to existing test files wherever possible. |
Add Theory-based tests for ImageContentBlock, AudioContentBlock, and BlobResourceContents covering: - Various lengths (empty, 1-4 bytes, 256 bytes, 1024 bytes) - Full base64 alphabet including '+' and '/' characters - With/without padding (0, 1, 2 padding chars) - All construction paths: FromBytes(), Data/Blob setter, JSON deserialization - Escaped and unescaped JSON deserialization - Lazy encoding verification (FromBytes defers until Data/Blob accessed) - Cache invalidation when Data/Blob setter is used after FromBytes Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Added in cb078ad. Comprehensive
|
…ypes Add Theory-based tests validating conversions to/from AIContent for ImageContentBlock, AudioContentBlock, and BlobResourceContents: - ToAIContent preserves decoded data and media type - Full roundtrip through ToAIContent → ToContentBlock - Both FromBytes() and Data/Blob setter construction paths - DataContent → ContentBlock direction (image/audio/blob) - JSON deserialization (both escaped and unescaped) through AIContent roundtrips Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
… duplicative ones - Merge *_ToAIContent_PreservesData and *_ToAIContent_RoundTrips into single tests that validate both the intermediate DataContent and the roundtripped block - Remove duplicative DataContent_*ToContentBlock_PreservesData tests (already covered by the roundtrip tests) - Add missing MediaType assertions on intermediate DataContent for *_DataSetter and *_BlobSetter test variants Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
tests/ModelContextProtocol.Tests/Protocol/ResourceContentsTests.cs
Outdated
Show resolved
Hide resolved
…a from ResourceContentsTests and AIContentExtensionsTests Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
ContentBlock.Converter.ReadandResourceContents.Converter.Readto handle escaped JSON base64FromBytesfactory methodsBase64TestData()method across test filesOriginal prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.