Entering edit mode
5.4 years ago
from the mountains
▴
250
i want to extract all paired reads from a bam file but for some reason i'm getting a much smaller file than expected.
$ samtools flagstat in.bam
56631731 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
417735 + 0 supplementary
0 + 0 duplicates
56631731 + 0 mapped (100.00% : N/A)
56213996 + 0 paired in sequencing
28112924 + 0 read1
28101072 + 0 read2
55820773 + 0 properly paired (99.30% : N/A)
56213996 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
380483 + 0 with mate mapped to a different chr
380483 + 0 with mate mapped to a different chr (mapQ>=5)
flagstat indicates to me that i should get 55820773 paired reads, or about 28M pairs of reads. Here is my command to extract:
$ samtools fastq -1 paired1.fastq.gz -2 paired2.fastq.gz -0 /dev/null -s /dev/null -n in.bam
$ zcat paired1.fastq.gz | wc -l
2885736
There are only about 700,000 reads in the first fastq file. I am trying to get rid of singletons. by the way the bam was generated by STAR, and subsequently sorted, indexed and filtered by samtools before using samtools fastq.
that ended up being the solution. thanks for your expertise--this kind of thing always takes like 45 minutes of my time to figure out and somehow i never retain the information i've learned.
That is why biostars is here. If you don't do this regularly there is no way to retain this sort of info :-)
Accept the answer (green check mark) to provide closure to the thread.
thanks, i did not know what that button was for!