What Happened To -K In Tophat For Multiple-Mapping Reads?
1
1
Entering edit mode
8.6 years ago
gaelgarcia05 ▴ 270

Selecting -g n in tophat does not discard reads mapping more than n, but instead only reports n alignments for those out all all their TOP scoring alignments.

I think there used to be an option -k that would allow one to discard reads that topped x alignments -- whatever happened to that? I only see -g in the tophat 2 manual, no reporting options like before...

multiple-alignment tophat2 rna-seq tophat • 2.7k views
1
Entering edit mode
7.6 years ago
Dan D 7.2k

You're correct that it appears to be gone, as it's now an unrecognized option on the command line. If I had to hazard a guess, I would speculate that it's because it's tricky to know what to do with the discarded reads. They're certainly not "unmapped," but then do you make another BAM for the "abundant" reads? Either that or there was a change to the algorithm where concurrency considerations made it difficult to track the total number of alignments for a given read until later on in the process, where any efficiency gains would be wiped out.

Fortunately, you can easily remove these reads downstream of tophat using BAMTools filter. For example, if you wanted to remove any read which mapped 20 or more times, you could supply the following JSON to the tool:

{
"tag" : "NH:<20"
}