|
body_stream, condition, column_names, type_cast, startup_code |
evaluate_stream receives a full/unmodified stream of pairs, yet column_names comes from a modified header and column_scheme refers to a reduced list of columns as well - this can cause a "silent" bug when it looks like pairs have been filtered, yet not all of the conditions would be met ...
Example:
say we start with a pairs-file with columns: #columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type walk_pair_index walk_pair_type read_len1 read_len2 mapq1 mapq2 ...
and say we --remove-columns read_len1,read_len2 - then any filtering expression referring to mapq1/2 would actually be using columns corresponding to read_len1/2 instead ... leading to incorrect results
pairtools/pairtools/cli/select.py
Line 230 in 7e69d6c
evaluate_streamreceives a full/unmodified stream of pairs, yetcolumn_namescomes from a modified header andcolumn_schemerefers to a reduced list of columns as well - this can cause a "silent" bug when it looks like pairs have been filtered, yet not all of the conditions would be met ...Example:
say we start with a pairs-file with columns:
#columns: readID chrom1 pos1 chrom2 pos2 strand1 strand2 pair_type walk_pair_index walk_pair_type read_len1 read_len2 mapq1 mapq2 ...and say we
--remove-columns read_len1,read_len2- then any filtering expression referring tomapq1/2would actually be using columns corresponding toread_len1/2instead ... leading to incorrect results