Skip to content

Add smithy-xml package#655

Open
arandito wants to merge 2 commits intosmithy-lang:feat/add-query-protocolfrom
arandito:add-smithy-xml
Open

Add smithy-xml package#655
arandito wants to merge 2 commits intosmithy-lang:feat/add-query-protocolfrom
arandito:add-smithy-xml

Conversation

@arandito
Copy link
Contributor

@arandito arandito commented Mar 11, 2026

Description

This PR adds an initial implementation of the new smithy-xml runtime package, which provides generic XML serialization and deserialization support for Smithy clients and servers.

This PR primarily implements the XML deserialization support needed by the awsQuery protocol. The serializer is intentionally omitted for now as awsQuery does not use XML request serialization. It will be introduced later alongside the restXml protocol, where XML request serialization is required and can be validated more thoroughly.

The implementation follows the XML binding rules defined in the Smithy awsQuery and restXml protocol specifications, including support for the following traits: @xmlName, @xmlAttribute, @xmlFlattened, and @xmlNamespace (defined in smithy-core).

Important

This implementation follows the Smithy Python Protocol Serialization and Deserialization design. I strongly recommend reviewing this document first to better understand the implementation.

Testing

Added round trip serde test cases that follow the same testing strategy as smithy-json.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@arandito arandito requested a review from a team as a code owner March 11, 2026 16:53
@arandito
Copy link
Contributor Author

arandito commented Mar 12, 2026

Update: XML serializers are omitted from this initial implementation as awsQuery does not use XML request serialization. They will be added with the restXml protocol. The following note can be ignored for this PR.

Reviewer Note

The current XML serializer builds a DOM tree in memory and only writes it to the output sink when serializer.flush() is called. This means any serializer.write_*() call must be followed by flush(). Our current HTTPRequestSerializer does not call flush() today because the smithy-json serializer streams directly to the sink instead of buffering. To support XML serialization, the generic ShapeSerializer must now be flushed at all call sites.

I explored implementing a streaming XML writer to avoid this but ran into an issue with XML attributes. In XML, members marked with @xmlAttribute must appear on the structure’s opening tag (e.g. <OperationInput attr="...">). In Smithy, however, attributes are modeled as normal structure members, so they may appear anywhere in the member list during serialization. As a result, the opening tag cannot be written until all attribute values are known, which effectively requires delaying writes until all members have been visited.

One possible solution is a two-pass approach: first serialize attribute members normally while sending others to a null serializer, then serialize non-attribute members while sending attribute members to a null serializer. This allows the opening tag and its attributes to be written before child elements. However, this introduces additional complexity compared to the current approach without a substantial benefit.

This creates some asymmetry with the current XML deserializer, which reads incrementally from the byte source rather than a DOM. I am open to switching the serializer to a streaming implementation if parity or avoiding the intermediate DOM is preferred.

Copy link
Contributor

@jonathan343 jonathan343 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to go over this a second time, but dropping some quick comments now since I might not be able to get back to this soon:

return self.document_value.get("scheme") # type: ignore


@dataclass(init=False, frozen=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking might be worth pulling this into a separate PR that targets develop so we can get this released early. My though process is:

  • If smithy-xml has a dependency on smithy-core (unbounded) we will have cases where people using versions that exist today such as (e.g., 0.3.0) which won't work.
  • It might be good to set a lower bound in smithy-xml like smithy-core>=<version>. However, to do this we'll need a specific version we know has these traits.

In reality, people using smithy-xml will be using it through some aws client that pins the proper dependencies. So I'm not too worried about it. But also if this is in a patch bump, the aws clients do things like ~= which still makes it possible for us to have weird cases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to keep here for now so the code works and tests are passing. But something we may want to consider doing in parallel and then rebase later

Comment on lines +99 to +100
def is_null(self) -> bool:
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants