Question: having trouble extracting paired reads from bam file
0
gravatar for from the mountains
5 months ago by
United States
from the mountains110 wrote:

i want to extract all paired reads from a bam file but for some reason i'm getting a much smaller file than expected.

$ samtools flagstat in.bam
56631731 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
417735 + 0 supplementary
0 + 0 duplicates
56631731 + 0 mapped (100.00% : N/A)
56213996 + 0 paired in sequencing
28112924 + 0 read1
28101072 + 0 read2
55820773 + 0 properly paired (99.30% : N/A)
56213996 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
380483 + 0 with mate mapped to a different chr
380483 + 0 with mate mapped to a different chr (mapQ>=5)

flagstat indicates to me that i should get 55820773 paired reads, or about 28M pairs of reads. Here is my command to extract:

$ samtools fastq -1 paired1.fastq.gz -2 paired2.fastq.gz -0 /dev/null -s /dev/null -n in.bam
$ zcat paired1.fastq.gz | wc -l
2885736

There are only about 700,000 reads in the first fastq file. I am trying to get rid of singletons. by the way the bam was generated by STAR, and subsequently sorted, indexed and filtered by samtools before using samtools fastq.

bam samtools alignment • 200 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by from the mountains110
2
gravatar for ATpoint
5 months ago by
ATpoint41k
Germany
ATpoint41k wrote:

You need to sort the bam by name or group by name (samtools sort -n or samtools collate) before writing as fastq.

ADD COMMENTlink written 5 months ago by ATpoint41k

that ended up being the solution. thanks for your expertise--this kind of thing always takes like 45 minutes of my time to figure out and somehow i never retain the information i've learned.

ADD REPLYlink written 5 months ago by from the mountains110

That is why biostars is here. If you don't do this regularly there is no way to retain this sort of info :-)

Accept the answer (green check mark) to provide closure to the thread.

ADD REPLYlink written 5 months ago by genomax92k

thanks, i did not know what that button was for!

ADD REPLYlink modified 5 months ago • written 5 months ago by from the mountains110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1867 users visited in the last hour