Question: TopHat2 number of mapped reads?
0
gravatar for samuelrivero
5.4 years ago by
samuelrivero50
United States
samuelrivero50 wrote:

Hello

I am new in RNA-seq. I am using Tophat2 to map single end reads to mm9. I am using tophat2 in this way:

tophat -p 10 --max-multihits 1 -G genes_mm9.gff -o output genome_mm9 reads.fastq

With --max-multihits 1, I assume I will get 1 alignment per read. Assuming that, the number of total reads that tophat2 uses for the mapping (19196075) should be the number of alignments in accepted_hits.bam file (8797938) (because --max-multihits 1) plus the total reads in unmapped.bam file (7538885). But that is not the case, there are 2859252 reads missing. Am I correct?

Thank you for your help

Samuel

rna-seq alignment next-gen • 1.7k views
ADD COMMENTlink written 5.4 years ago by samuelrivero50

Were all the reads of the same length? tophat2 will filter out reads that are too short.

ADD REPLYlink written 5.4 years ago by Devon Ryan91k

19196075 is the reads used for mapping.

left_kept_reads.info file:

reads_in =19197489
reads_out=19196075

Tophat2 just filtered 1414 reads

ADD REPLYlink modified 13 days ago by RamRS24k • written 5.4 years ago by samuelrivero50

Comment deleted. 

ADD REPLYlink modified 5.4 years ago • written 5.4 years ago by Ashutosh Pandey11k

Thanks Ashtosh, that was my first thought. But reading the tophat manual is not really clear for me. According to the tophat manual:

-g/--max-multihits <int> Instructs TopHat to allow up to this many alignments to the 
                         reference for a given read, and choose the alignments based on 
                         their alignment scores if there are more than this number. The
                         default is 20 for read mapping. Unless you use
                         --report-secondary-alignments, TopHat will report the
                         alignments with the best alignment score. If there are more
                         alignments with the same score than this number, TopHat will
                         randomly report only this many alignments. In case of using
                         --report-secondary-alignments, TopHat will try to report
                         alignments up to this option value, and TopHat may randomly
                         output some of the alignments with the same score to meet 
                         this number.

With -g/--max-multihits 1, what I understand is that TopHat will report the best alignment for each read, or randomly select one alignment in case of several alignments with the same score.

Maybe my interpretation is wrong.

Thanks

ADD REPLYlink modified 13 days ago by RamRS24k • written 5.4 years ago by samuelrivero50

Actually , my interpretation of that parameter was wrong. I will delete my explanation above so that other people don't get confuse.

ADD REPLYlink modified 13 days ago by RamRS24k • written 5.4 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1948 users visited in the last hour