Long reads and fixing of mate-pair issues/marking duplicates with samtools
20 days ago
Zeng Hao ▴ 40

Hi everyone,

I am trying to call for structural variants (using svim) in my PacBio long reads sequencing dataset. However, I noticed that I get a vastly different number of variants (100,000 vs 1,000) when I used a bam alignment (from ngmlr) directly converted with samtools versus one that I processed with samtools fixmate and samtools markdup prior (significantly less in the latter). (Workflow: https://www.htslib.org/workflow/fastq.html)

Is this normal? And are these steps necessary for this specific use case (SV calling)? (Frankly I do not quite understand what impact the samtools fixmate or samtools markdup might be)

Thank you very much for your help.

[Edited for clarity]

Best regards,


mate-pair samtools alignment
16 days ago
aw7 ▴ 310

I do not think samtools fixmate and markdup are going to work on PacBio long reads. fixmate is for setting and repairing mate information for read pairs which (as far as I know) PacBio does not have. markdup might do something useful with the right settings, but I do not know if anyone has ever tried it properly.

If they are not helping you then I would not use them.


