At the moment preprocessing generates each metadata field independently from the input metadata submitted by the user or obtained from the nextclade run. However, it is not configured to use generated metadata to create new metadata fields.
For example the concatenate takes country, date and accession as input and joins them to create a displayName field. The processed country field is processed by process_options to validate it is a valid country. We do not perform this processing in concatenate and concatenate thus accepts invalid countries. The only way to add this validation would be to create a new function with the necessary input of both the concatenate and process_options, this is what we had to do with the build_display_name and my new assign_custom_lineage function.
Should we alternatively restructure the pipeline to allow metadata fields to be created from processed metadata fields? i.e. allow concatenate to take the processed country field as input instead of the non-processed input field. This raises a question of how many layers of fields can/shoudl we allow. Given input metadata field I1 and I2, we now support output field A produced from I1 and I2. If we allow B produced from A and I2 we might still want to create C from B and I1...
At the moment preprocessing generates each metadata field independently from the input metadata submitted by the user or obtained from the nextclade run. However, it is not configured to use generated metadata to create new metadata fields.
For example the
concatenatetakes country, date and accession as input and joins them to create a displayName field. The processed country field is processed byprocess_optionsto validate it is a valid country. We do not perform this processing inconcatenateandconcatenatethus accepts invalid countries. The only way to add this validation would be to create a new function with the necessary input of both theconcatenateandprocess_options, this is what we had to do with thebuild_display_nameand my newassign_custom_lineagefunction.Should we alternatively restructure the pipeline to allow metadata fields to be created from processed metadata fields? i.e. allow
concatenateto take the processed country field as input instead of the non-processed input field. This raises a question of how many layers of fields can/shoudl we allow. Given input metadata field I1 and I2, we now support output field A produced from I1 and I2. If we allow B produced from A and I2 we might still want to create C from B and I1...