Tophat --max-multihits option
2
1
Entering edit mode
7.8 years ago
biolab ★ 1.4k

Dear all

I have a simple question: Tophat2 has --max-multihits option. If I set it to one, does it mean that each read is mapped at unique locus? This will loose many reads for multi-copy genes (for example, Actin genes). Could you please explain to me why some research work used "uniq mapping" reads?

tophat • 3.6k views
5
Entering edit mode
7.8 years ago
Martombo ★ 3.0k

Yes with --max-multihits 1 you're going to get only uniquely mapped reads. This is not such a bad idea as it may seem and actually a lot of programs for subsequent steps of the analysis will only use uniquely mapped reads (one for all HTSeq-count). This approach is very conservative and you lose quite a good number of reads. But in my experience (I also performed a few simulations to prove this) the results are very reliable. Basically all other possibilities (like with RSEM) make use of some assumptions: for example what happens if the ratio between the expression of two paralogs is different in two conditions? (for example for differential splicing) you will get a bias in the fold change estimate. while if you only consider unambiguous reads, you will only get a lower significance for an eventual differential expression. of these two scenarios I prefer the latter.

Have a look at these slides, to make it clearer. (the simulations are based on SMN1 and SMN2, which to my knowledge are two of the paralogs in the human genome with the highest similarity. they only have 2 mismatches on their sequence. given 100bp SE reads, 85% of the total reads of these two genes will be ambiguous, or multi mapped. 1000 simulations are plotted. the DE analysis was done with DESeq2)

https://www.dropbox.com/s/6w55godj2wetbed/unambiguous_counts.pdf?dl=0

0
Entering edit mode

0
Entering edit mode
6.4 years ago
glihm ▴ 650

To complete the response of Martombo, uniquely mapped reads can be useful when your are studying a special biological event like the translation for instance (With ribosome profiling). When we filter the data from sequencer, we select good quality reads and then the mapping is done keeping only uniquely mapped reads !

So, some duplicated regions will be removed (I mean, no reads will map on these regions), but others mapping are used to study these special cases if needed. ;)