Tophat alignment and -g parameter
1
0
Entering edit mode
13 months ago
Sam ▴ 170

An alignment I was running of 150 bp paired-end data took a lot of time (more than a week) to run. As the stage that took it too much time in the log file was "reporting output tracks" , I thought to limit the output of tophat via "-g 1" (--max-multihits), thinking this is the reporting of all combinations of all paired reads that takes time.

However, the alignment results are different between the two options.

For -g 20

Left reads:
Input   :  19149286
Mapped  :  18242457 (95.3% of input)
of these:  2389313 (13.1%) have multiple alignments (602054 have >20)

While for -g 1

Left reads:
Input    :  19149286
Mapped   :  18210924 (95.1% of input)
of these :  1422182 ( 7.8%) have multiple alignments (1434114 have >1)

The number of uniquely reads is larger for -g 1, than for -g 20.

If -g influences the alignment stage, not only the report stage, in what way does it? This was asked a long time ago

http://seqanswers.com/forums/showthread.php?t=44608

Tophat multiple alignment and mapping rates

But was not resolved successfully.

RNA-Seq tophat • 349 views
ADD COMMENT
0
Entering edit mode

Hello Sam,

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

https://twitter.com/lpachter/status/937055346987712512

ADD REPLY
0
Entering edit mode

Thanks. I would still love to get an answer, if it is known.

ADD REPLY
0
Entering edit mode
13 months ago
Sam ▴ 170

Simon Andrews writes here

if you run tophat with -g1 then all of the hits reported are given a mapping quality of 50, even if there are multiple perfect hits with the same strength in the file. It looks like tophat is calculating the mapq score after the output filtering is done, rather than before and is giving false confidence in hits which can actually be very poor.

ADD COMMENT

Login before adding your answer.

Traffic: 1798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6