Skip to content

Comments

Add __td_missing__ marker as default value for non-total TypedDict#19

Open
giograno wants to merge 2 commits intomainfrom
none-default-non-total-td
Open

Add __td_missing__ marker as default value for non-total TypedDict#19
giograno wants to merge 2 commits intomainfrom
none-default-non-total-td

Conversation

@giograno
Copy link
Member

Problem

In LocalStack, we often have to serialize non-total TypedDict with optional keys. These structures are generated from the specs and receive weekly updates.

Since the specs aim at being backward compatible, when these structures change, they usually get new optional keys.
AvailabilityZone in this change is an example.

Within Avro, a change like that is not backward compatible, as a new value without a default is being added. Therefore, this would require some sort of migration. However, we might argue that these types of changes are, in fact, backwards compatible by default.

Solution

With non-total TypedDict, we already extend the type of the keys with str. This allows the serialization layer to put a sentinel value __td_missing__ to the keys that are absent from the dictionary (to distinguish them from the present keys with None value).

The overall idea is to use the __td_missing__ marker as the default value for these TypedDict. The new Avro schemas will remain backward compatible, and the serializer will ignore those keys if not set.

@giograno giograno requested review from bentsku and purcell February 19, 2026 09:24
@giograno giograno self-assigned this Feb 19, 2026
@giograno giograno marked this pull request as ready for review February 19, 2026 09:25
Copy link

@bentsku bentsku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I put some more thoughts into that and I believe this is the right design, and it actually aligns our schema with our previous implementation of __td_missing__.

To me, this looks like the right approach due to the nature itself of TypedDict. They are not a runtime type, they are only a construct to help people understand what those dict contains. Especially non-total typed dicts, which are a mere suggestion of what the dict should look like. I'm not sure but I believe maybe they can be enforced with static type checkers to at least be restrictive (to not set a key that would not normally be part of the annotations), but I'm not even sure.

Building on top of that, Python itself does not allow you to put default values on TypedDict. This PR actually also builds on those assumptions, and makes sense if you consider Python own behavior.

If you are to instantiate a typed dictionary today with this definition:

class MyDict(TypedDict, total=False):
    my_field: str

If you instantiate this, you will just get an empty dict. And if you update this definition to have an additional field, and you instantiate it, you will still get an empty dict.

So when working with typed dict, you can never know what data you get in there. Following this convention so that Avro schema are compatible versions to versions makes sense. The code interacting with non-total TypedDict should always be defensive enough, you cannot know for a fact a value will be in there.

And to me, it's not the responsibility of the persistence framework to deal with that. User of such datastructure need to be careful.

So yes, I'm all in favor of this 👍 very good idea to extend the concept of __td_missing__! 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants