Question: Does Bam File Include Unmapped Reads As Well?
1
gravatar for Jordan
5.4 years ago by
Jordan1.0k
Pittsburgh
Jordan1.0k wrote:

Hi,

I have a rookie question. I was using the samtools flagstat to check the statistics of bam files. When I view the results of that bam file, I see that number of reads which pass the QC is sometimes more than the number of reads which mapped. My understanding is that bam files only include mapped reads. Does it have unmapped reads too?

An e.g., is:

$ samtools flagstat file.bam
257823892 + 0 in total (QC-passed reads + QC-failed reads)
132531248 + 0 duplicates
209402202 + 0 mapped (81.22%:nan%)
257823892 + 0 paired in sequencing
128911946 + 0 read1
128911946 + 0 read2
152678438 + 0 properly paired (59.22%:nan%)
48421690 + 0 singletons (18.78%:nan%)
3565988 + 0 with mate mapped to a different chr
1316058 + 0 with mate mapped to a different chr (mapQ>=5)

Here, the mapping is 81.22%. I thought if the bam files have only mapped reads, then it should be 100% mapped. Can anyone help me understand this? Tried looking online but no luck.

The bam file was generated by Lifescope mapping using paired SOLiD reads.

Thanks!

read bam mapping • 3.8k views
ADD COMMENTlink modified 8 days ago by ammar.husami0 • written 5.4 years ago by Jordan1.0k

I guess for lifescope, the read pair where both the reads remain ualigned ends up in unmapped.bam file. I think you have the option to select what you want to do with the unmapped reads. But the mapped bam file will have both the reads from a read pair where one read was mapped and other failed to map.

ADD REPLYlink written 5.4 years ago by Ashutosh Pandey11k

Oh I see. In that case singleton means all the single reads which failed to map. And the number 209,402,202 means all the pairs that were mapped. Is that right?

ADD REPLYlink written 5.4 years ago by Jordan1.0k
1
gravatar for swbarnes2
5.4 years ago by
swbarnes23.8k
United States
swbarnes23.8k wrote:

Some aligners will leave out unmapped reads. I think Bowtie by default does that. Others, like bwa will leave them in. I guess LifeScope leaves them in.

Note that samtools flagstat is only reading the flags. It itself is not trying to make any QC decisions, or duplicate decisions, or anything like that. There is a flag for "failed QC", but that doesn't mean that the software you used necessarily tried to assess that. So you can't take those flagstat lines at face value, then only mean something if you ran software which would have correctly set those flags in your .bam. If you are sure you should have more reads, maybe LifeScope was doing an internal QC, and dumping the bad reads.

ADD COMMENTlink written 5.4 years ago by swbarnes23.8k
0
gravatar for Tomáš Beluský
5.4 years ago by
Brno
Tomáš Beluský90 wrote:

It also store unmapped reads, you can find out which are these by flags. See this: http://picard.sourceforge.net/explain-flags.html

ADD COMMENTlink written 5.4 years ago by Tomáš Beluský90

The total number of reads in this file were 324 million reads. Now the bam file has overall 257 million reads. How are these 257 million reads selected then?

ADD REPLYlink written 5.4 years ago by Jordan1.0k

It looks like QC-failed reads were removed.

ADD REPLYlink written 5.4 years ago by Tomáš Beluský90
0
gravatar for ammar.husami
8 days ago by
ammar.husami0 wrote:

the question " Does "BAM" have unmapped reads too?" keeping the unaligned reads is an option. Some workflows keep the reads and some other don't depending on their use case. so the answer is it depends on your file you have at hand.

ADD COMMENTlink modified 8 days ago • written 8 days ago by ammar.husami0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1264 users visited in the last hour