Skip to content

feat(prepro): add custom lineage functions#6174

Open
anna-parker wants to merge 15 commits intomainfrom
add_custom_lineage_functions
Open

feat(prepro): add custom lineage functions#6174
anna-parker wants to merge 15 commits intomainfrom
add_custom_lineage_functions

Conversation

@anna-parker
Copy link
Copy Markdown
Contributor

@anna-parker anna-parker commented Mar 20, 2026

resolves #

thanks to claude for writing the tests!

Screenshot

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by appropriate, automated tests.
  • Any manual testing that has been done is documented (i.e. what exactly was tested?)

🚀 Preview: https://add-custom-lineage-functi.loculus.org

@claude claude bot added the preprocessing Issues related to the preprocessing component label Mar 20, 2026
@anna-parker anna-parker added the preview Triggers a deployment to argocd label Mar 20, 2026
@anna-parker anna-parker force-pushed the add_custom_lineage_functions branch from 1457a0f to 67227c6 Compare March 20, 2026 14:10
@anna-parker anna-parker changed the base branch from main to fieldPresets March 20, 2026 14:11
@anna-parker anna-parker marked this pull request as ready for review March 22, 2026 11:40
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 22, 2026

Claude finished @anna-parker's task in 3m 10s —— View job


Code Review

  • Gather context and read changed files
  • Review processing_functions.py changes
  • Review test_assign_custom_lineage.py
  • Review values.yaml changes
  • Review test_metadata_processing_functions.py changes
  • Post final review

Good addition overall. The new is_variant function is clean and well-tested. A few issues worth addressing:

is_variant

  • Lines 1218–1219: direct input_data["key"] access will raise KeyError (not caught by the local except (ValueError, TypeError)) if the key is absent rather than None. Use .get(). → inline comment
  • Line 1241: error message says "non-numeric threshold value" but the threshold is computed — the actual non-numeric value is length or numMutations. → inline comment

assign_custom_lineage

  • Line 1310: if a segment has no reference_segN key in input (partial genome), extract_regex returns None, which ends up in extracted_subtypes. The reassortant check len(set(...)) > 1 would then count None as a distinct subtype and false-positive. Filtering out Nones fixes it. → inline comment

Tests

  • ARGS constant contains unrelated fields (is_insdc_ingest_group, submittedAt, ACCESSION_VERSION). → inline comment
  • No test for partial genomes (missing reference for some segments) — the most direct way to exercise the false-reassortant bug above. → inline comment

values.yaml

  • The explanatory comment about why totalSubstitutions is used (no private mutations in custom nextclade dataset) was dropped when rewriting the SARS-CoV-2 variant fields. → inline comment

@anna-parker anna-parker force-pushed the add_custom_lineage_functions branch from 9522172 to 7755a97 Compare March 27, 2026 12:37
@anna-parker anna-parker changed the base branch from fieldPresets to main March 27, 2026 12:38
@anna-parker anna-parker force-pushed the add_custom_lineage_functions branch from 7755a97 to deb7468 Compare March 27, 2026 12:39
@anna-parker anna-parker force-pushed the add_custom_lineage_functions branch from 1fe48cc to 93e1907 Compare March 27, 2026 14:35
@anna-parker anna-parker changed the base branch from main to minimizer_parser_separator March 27, 2026 14:36
@anna-parker anna-parker removed the preview Triggers a deployment to argocd label Mar 27, 2026
@anna-parker anna-parker added the preview Triggers a deployment to argocd label Mar 27, 2026
@theosanderson
Copy link
Copy Markdown
Member

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1cdb1beddf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Base automatically changed from minimizer_parser_separator to main April 1, 2026 11:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

preprocessing Issues related to the preprocessing component preview Triggers a deployment to argocd

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants