having trouble extracting paired reads from bam file
1
0
Entering edit mode
3.9 years ago

i want to extract all paired reads from a bam file but for some reason i'm getting a much smaller file than expected.

$ samtools flagstat in.bam
56631731 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
417735 + 0 supplementary
0 + 0 duplicates
56631731 + 0 mapped (100.00% : N/A)
56213996 + 0 paired in sequencing
28112924 + 0 read1
28101072 + 0 read2
55820773 + 0 properly paired (99.30% : N/A)
56213996 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
380483 + 0 with mate mapped to a different chr
380483 + 0 with mate mapped to a different chr (mapQ>=5)

flagstat indicates to me that i should get 55820773 paired reads, or about 28M pairs of reads. Here is my command to extract:

$ samtools fastq -1 paired1.fastq.gz -2 paired2.fastq.gz -0 /dev/null -s /dev/null -n in.bam
$ zcat paired1.fastq.gz | wc -l
2885736

There are only about 700,000 reads in the first fastq file. I am trying to get rid of singletons. by the way the bam was generated by STAR, and subsequently sorted, indexed and filtered by samtools before using samtools fastq.

bam samtools alignment • 801 views
ADD COMMENT
2
Entering edit mode
3.9 years ago
ATpoint 82k

You need to sort the bam by name or group by name (samtools sort -n or samtools collate) before writing as fastq.

ADD COMMENT
0
Entering edit mode

that ended up being the solution. thanks for your expertise--this kind of thing always takes like 45 minutes of my time to figure out and somehow i never retain the information i've learned.

ADD REPLY
0
Entering edit mode

That is why biostars is here. If you don't do this regularly there is no way to retain this sort of info :-)

Accept the answer (green check mark) to provide closure to the thread.

ADD REPLY
0
Entering edit mode

thanks, i did not know what that button was for!

ADD REPLY

Login before adding your answer.

Traffic: 2580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6