I used tophat to align paired-end reads from an rna-seq experiment and I obtained an accepted_hits.bam alignment file.
Using the accepted_hits.bam I'd like to count the number of properly aligned paired-end reads (i.e. both ends are aligned and the alignment make sense - something like that, not sure on the best definition here).
How should I count the number of properly-aligned paired-end fragments? I'd like to have the count in terms of fragments instead of reads (i.e., each fragment should be counted once if both paired end reads aligned properly).
Suggestions are welcome, I am new to processing paired-end data so thought asking here.
I just tried this:
Sample_5> samtools flagstat accepted_hits.bam 46656617 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 duplicates 46656617 + 0 mapped (100.00%:nan%) 46656617 + 0 paired in sequencing 24411125 + 0 read1 22245492 + 0 read2 19907436 + 0 properly paired (42.67%:nan%) 36273430 + 0 with itself and mate mapped 10383187 + 0 singletons (22.25%:nan%) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
there seem to be some issue on how tophat reports the alignments but not sure here: http://seqanswers.com/forums/showthread.php?t=8186.
Also, let me see if I understand this output correctly: seems there are 46 MM reads (that is counting reads from end1 + reads from end2?)