coverage differences between samtools depth and samtools mpileup
1
0
Entering edit mode
3.5 years ago

Hello there,

We're working on ONT long reads data. To check base frequencies on specific positions, we planned to use the mpileup function from samtools. We've also computed depth using samtools depth:

samtools mpileup -d 200000 -Q 0 -f reference barcode.bam  > barcode_mpileup.vcf
samtools depth -d 200000  barcode.bam > barcode.cov

However, we get coverage differencies for some positions using these two commands. And I dont realy get where these differences came from. See results: -samtools depth:

P3_GTE  1   9019
P3_GTE  2   11481
P3_GTE  3   13962
P3_GTE  4   14929
.......................
P3_GTE  20  45314
P3_GTE  21  61244
P3_GTE  22  96184
P3_GTE  23  115121
P3_GTE  24  126748
P3_GTE  25  131826
P3_GTE  26  134085
P3_GTE  27  137528
P3_GTE  28  134741
P3_GTE  29  138045
P3_GTE  30  127061

-samtools mpileup

P3_GTE 1 C 9019
P3_GTE 2 T 11481
P3_GTE 3 A 13962
P3_GTE 4 C 14929
......................
P3_GTE 20 G 45569
P3_GTE 21 C 61508
P3_GTE 22 T 97718
P3_GTE 23 G 116528
P3_GTE 24 T 127407
P3_GTE 25 G 132341
P3_GTE 26 C 135732
P3_GTE 27 C 138298
P3_GTE 28 T 139815
P3_GTE 29 T 141861
P3_GTE 30 G 143441

I've checked my bam using flagstat after consulting some github issues:

179524 + 0 in total (QC-passed reads + QC-failed reads)
3 + 0 secondary
1169 + 0 supplementary
0 + 0 duplicates
179524 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Can we suppose these differences are caused by these "supplementary" and "secondary" reads? What are these reads exaclty ? It is possible to exclude them from bam?

Thanks,stay safe and have a nice week,

Hadrien

samtools ONT long read • 1.1k views
ADD COMMENT
0
Entering edit mode

Check these mpileup arguments to only count reads having (or not having) the respective flags set. For flags, see https://broadinstitute.github.io/picard/explain-flags.html

--rf, --incl-flags STR|INT  required flags: skip reads with mask bits unset []
--ff, --excl-flags STR|INT  filter flags: skip reads with mask bits set
                                            [UNMAP,SECONDARY,QCFAIL,DUP]
ADD REPLY
0
Entering edit mode
3.5 years ago

thanks to your link and command, i tried:

samtools mpileup -d 200000 -Q 0 --ff 2176 -f reference.fasta barcode.bam > pileup.vcf

However still cant explain the depth differences, even without theses reads. Such as the position 30:

P3_GTE  30  127061 DEPTH
P3_GTE 30 G 142976 MPILEUP

Maybe an other method is more appropriate to single long read ONT data? OR there is a point I've misunderstood?

ADD COMMENT

Login before adding your answer.

Traffic: 2023 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6