Add __td_missing__ marker as default value for non-total TypedDict#19
Add __td_missing__ marker as default value for non-total TypedDict#19
Conversation
There was a problem hiding this comment.
LGTM, I put some more thoughts into that and I believe this is the right design, and it actually aligns our schema with our previous implementation of __td_missing__.
To me, this looks like the right approach due to the nature itself of TypedDict. They are not a runtime type, they are only a construct to help people understand what those dict contains. Especially non-total typed dicts, which are a mere suggestion of what the dict should look like. I'm not sure but I believe maybe they can be enforced with static type checkers to at least be restrictive (to not set a key that would not normally be part of the annotations), but I'm not even sure.
Building on top of that, Python itself does not allow you to put default values on TypedDict. This PR actually also builds on those assumptions, and makes sense if you consider Python own behavior.
If you are to instantiate a typed dictionary today with this definition:
class MyDict(TypedDict, total=False):
my_field: strIf you instantiate this, you will just get an empty dict. And if you update this definition to have an additional field, and you instantiate it, you will still get an empty dict.
So when working with typed dict, you can never know what data you get in there. Following this convention so that Avro schema are compatible versions to versions makes sense. The code interacting with non-total TypedDict should always be defensive enough, you cannot know for a fact a value will be in there.
And to me, it's not the responsibility of the persistence framework to deal with that. User of such datastructure need to be careful.
So yes, I'm all in favor of this 👍 very good idea to extend the concept of __td_missing__! 💯
Problem
In LocalStack, we often have to serialize non-total
TypedDictwith optional keys. These structures are generated from the specs and receive weekly updates.Since the specs aim at being backward compatible, when these structures change, they usually get new optional keys.
AvailabilityZonein this change is an example.Within Avro, a change like that is not backward compatible, as a new value without a default is being added. Therefore, this would require some sort of migration. However, we might argue that these types of changes are, in fact, backwards compatible by default.
Solution
With non-total
TypedDict, we already extend the type of the keys withstr. This allows the serialization layer to put a sentinel value__td_missing__to the keys that are absent from the dictionary (to distinguish them from the present keys withNonevalue).The overall idea is to use the
__td_missing__marker as the default value for theseTypedDict. The new Avro schemas will remain backward compatible, and the serializer will ignore those keys if not set.