Skip to content

Comments

added prefetch and minor updates to speedup pileup#1986

Open
vasudeva8 wants to merge 2 commits intosamtools:developfrom
vasudeva8:pileup_1
Open

added prefetch and minor updates to speedup pileup#1986
vasudeva8 wants to merge 2 commits intosamtools:developfrom
vasudeva8:pileup_1

Conversation

@vasudeva8
Copy link
Contributor

Added prefetch to improve pileup performance.
Updated overlap_remove to reduce hash search.

this is for samtools mpileup speed up (PR samtools#2301)

sam.c Outdated
lbnode_t **pptr = &iter->head;
while (*pptr != iter->tail) {
if ((*pptr)->next)
__builtin_prefetch((*pptr)->next);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should have an hts_prefetch defined somewhere with compiler checks. __builtin_prefetch is a gcc thing, albeit supported on many other platforms and compilers too.

I'll push a new commit containing autoconf changes.

This is to facilitate the use of prefetch on deep pileups to avoid
memory latency.

On a Covid sample (local 34005_1#18.fixmate.bam file) of 150-200k
depth, the perf report -n stats for bam_plp64_next are:

    prefetch  16523
    volatile  16839
    none      17895

These are the 5th best time of 10 runs, showing 8% gain for
prefetching and a 6% gain for the poor-man's alternative.
On shallow data it's not beneficial, but at a cost of ~2% instead.
(We may wish to consider removing the volatile case as it's a little
more costly on shallow data and a little less beneficial on deep, but
it's going to be rarely used regardless.)

NB: speed has not been checked on ARM CPUs.

Also added a check for __builtin_clz which we already use elsewhere
via compiler version checks, as we may as well add both.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants