BAM to FASTQ discards most reads as singletons though flagstat says otherwise
1
3
Entering edit mode
3.7 years ago
vkkodali_ncbi ★ 3.7k

I have a bam file that returns the following when I run samtools flagstat:

37353420 + 61518 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
9068558 + 0 duplicates
29533570 + 48760 mapped (79.07% : 79.26%)
37353420 + 61518 paired in sequencing
18676710 + 30759 read1
18676710 + 30759 read2
25268546 + 41600 properly paired (67.65% : 67.62%)
27671498 + 45600 with itself and mate mapped
1862072 + 3160 singletons (4.99% : 5.14%)
1184710 + 1916 with mate mapped to a different chr
1023122 + 1644 with mate mapped to a different chr (mapQ>=5)

However, when I run samtools fastq to extract these reads in FASTQ format, most of the reads are discarded as singletons, why?

$ samtools fastq -1 reads.1.fq -2 reads.2.fq -s unpaired.fq --reference genome.fna input.sorted.bam 
[M::bam2fq_mainloop] discarded 36407186 singletons
[M::bam2fq_mainloop] processed 37414938 reads
samtools bam • 2.5k views
ADD COMMENT
1
Entering edit mode

is the sorted bam file sorted on name or position?

ADD REPLY
3
Entering edit mode
3.7 years ago
vkkodali_ncbi ★ 3.7k

They are position-sorted which is the samtools sort default. I did not see any mention of the bam file being queryname-sorted. Sorting the bam file using samtools sort -n in.bam -o out.querysort.bam and then using samtools fastq returns the expected result.

ADD COMMENT
0
Entering edit mode

so this issue is resolved now?

ADD REPLY
0
Entering edit mode

Yup, I get the correct set of data now. Thank you!

ADD REPLY
0
Entering edit mode

I moved your comment to an answer, now you can accept it as resolved (this helps to keep the site structured and clean ;)

ADD REPLY

Login before adding your answer.

Traffic: 2069 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6