Question: Samtools Count Paired-End Reads
gravatar for dfernan
5.9 years ago by
United States
dfernan640 wrote:


I used tophat to align paired-end reads from an rna-seq experiment and I obtained an accepted_hits.bam alignment file.

Using the accepted_hits.bam I'd like to count the number of properly aligned paired-end reads (i.e. both ends are aligned and the alignment make sense - something like that, not sure on the best definition here).

How should I count the number of properly-aligned paired-end fragments? I'd like to have the count in terms of fragments instead of reads (i.e., each fragment should be counted once if both paired end reads aligned properly).

Suggestions are welcome, I am new to processing paired-end data so thought asking here.

I just tried this:

Sample_5> samtools flagstat accepted_hits.bam 
46656617 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
46656617 + 0 mapped (100.00%:nan%)
46656617 + 0 paired in sequencing
24411125 + 0 read1
22245492 + 0 read2
19907436 + 0 properly paired (42.67%:nan%)
36273430 + 0 with itself and mate mapped
10383187 + 0 singletons (22.25%:nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

there seem to be some issue on how tophat reports the alignments but not sure here:

Also, let me see if I understand this output correctly: seems there are 46 MM reads (that is counting reads from end1 + reads from end2?)


paired-end samtools tophat • 7.7k views
ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by dfernan640

The flag field in the BAM file will contain the information you want. It is worth your time to read the SAM specifications: Look @ 2. FLAG: bitwise FLAG. Each bit is explained in the following table:

ADD REPLYlink written 5.9 years ago by Zev.Kronenberg11k

thanks! that is assuming tophat correctly uses such flag, i am looking more for experience here, seems that's not always the case... here they mention explicitly the following: According to an unanswered SEQAnswers question , TopHat might not be setting this flag correctly or perhaps this flag is used differently in TopHat...

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by dfernan640
gravatar for Ashutosh Pandey
5.9 years ago by
Ashutosh Pandey11k wrote:

Depending on your needs you should be using the following flags to extract the reads from bam files. Check samtools manual to learn how to extract the reads or there are many posts on biostar about the same.

99 - read paired, read mapped in proper pair, mate reverse strand, first in pair, 163 - read paired, read mapped in proper pair, mate reverse strand, second in pair , 83 and 147, (same as above i.e.properly paired reads but they map in the reverse)

I dont know how Tophat treats the reads that are aligned properly in terms of orientation but not in terms of the insert distance. It may happen that the original insert size between reads is 300 bp but these reads may part of different exons that tend to lie far away on the reference genome because there is a large intron between them. In this case, though the reads are paired properly or mapped properly but they may be flagged as improper by Tophat.

I wont considering removal of unpaired reads as the proper pairing can fail for may reasons: one of the pair maps in the repeat regions or sequencer messed up one of the reads in the pair etc. In case you need them, you should also include following flags:

73 - read paired, mate unmapped, first in pair, 137 - read paired, mate unmapped, second in pair, 89, 153 same as above but they map in the reverse.

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by Ashutosh Pandey11k

thanks a lot, v helpful!

ADD REPLYlink written 5.9 years ago by dfernan640
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1150 users visited in the last hour