Basic Question : #Reads Used By Tophat
1
1
Entering edit mode
12.4 years ago
Abhi ★ 1.6k

Hey Guys

I have a basic question in trying to make sense of the number of reads mapped by tophat. My understanding is the following

#input_reads (to tophat) = #unmapped_reads(by tophat) + #primary_mapped reads

In my case the above equation is not working out as #primarymapped reads + #unmappedreads is coming out to be more than #input_reads. My hunch is that either the flag is not correctly set in the bam file by tophat or my equation/understanding of how the tophat uses & flags the reads is not right.

Also I am calculating the #primary reads by using the sam flag -F 256

Thanks! -Abhi

rna-seq cufflinks transcriptome • 2.6k views
ADD COMMENT
1
Entering edit mode

Did you check if your equation works if you set --max-multihits to 1?

ADD REPLY
0
Entering edit mode

No I dint try that until now

ADD REPLY
0
Entering edit mode
12.4 years ago
Arun 2.4k

What I'd do is to get the total number of reads in the sam file corresponding to every fastq header (or read id). If you have a paired end read (PE), one would expect this count to be not more than 2( and <= 1 for SE). You can isolate those reads that are represented >2 and then check if it is due to multiple hits as Federico points out or something else.

ADD COMMENT
0
Entering edit mode

I see you point..for now I was trying to do it with bitwise flag from the bam file. Counting based on read head is a bit memory intensive but worth as a check. --Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6