Question: Tophat --max-multihits option
1
gravatar for biolab
5.1 years ago by
biolab1.2k
biolab1.2k wrote:

Dear all

I have a simple question: Tophat2 has --max-multihits option. If I set it to one, does it mean that each read is mapped at unique locus?  This will loose many reads for multi-copy genes (for example, Actin genes).   Could you please explain to me why some reserach work used "uniq mapping" reads?

I appreciate any of your comments. Thank you very much!

tophat • 2.9k views
ADD COMMENTlink modified 3.7 years ago by glihm620 • written 5.1 years ago by biolab1.2k
5
gravatar for Martombo
5.1 years ago by
Martombo2.6k
Seville, ES
Martombo2.6k wrote:

yes with --max-multihits 1 you're going to get only uniquely mapped reads. This is not such a bad idea as it may seem and actually a lot of programs for subsequent steps of the analysis will only use uniquely mapped reads (one for all HTSeq-count). This approach is very conservative and you lose quite a good number of reads. But in my experience (I also performed a few simulations to prove this) the results are very reliable. Basically all other possibilities (like with RSEM) make use of some assumptions: for example what happens if the ratio between the expression of two paralogs is different in two conditions? (for example for differential splicing) you will get a bias in the fold change estimate. while if you only consider unambiguous reads, you will only get a lower significance for an eventual differential expression. of these two scenarios I prefer the latter.

 

Have a look at these slides, to make it clearer. (the simulations are based on SMN1 and SMN2, which to my knowledge are two of the paralogs in the human genome with the highest similarity. they only have 2 mismatches on their sequence. given 100bp SE reads, 85% of the total reads of these two genes will be ambiguous, or multi mapped. 1000 simulations are plotted. the DE analysis was done with DESeq2)

https://www.dropbox.com/s/6w55godj2wetbed/unambiguous_counts.pdf?dl=0

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Martombo2.6k

Than you very much, Martombo, your detailed comment is very helpful!

ADD REPLYlink written 5.1 years ago by biolab1.2k
0
gravatar for glihm
3.7 years ago by
glihm620
France
glihm620 wrote:

To complete the response of Martombo, uniquely mapped reads can be useful when your are studying a special biological event like the translation for instance (With ribosome profiling). When we filter the data from sequencer, we select good quality reads and then the mapping is done keeping only uniquely mapped reads !

So, some duplicated regions will be removed (I mean, no reads will map on these regions), but others mapping are used to study these special cases if needed. ;)

ADD COMMENTlink written 3.7 years ago by glihm620
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2364 users visited in the last hour