Question: hisat2 parameters tuning to match tophat2 parameters
gravatar for pierre.ortalo
6 days ago by
pierre.ortalo0 wrote:

Hi, quick presentation: other bio-informatics students and myself are working on a RNA-seq project during summer to get our hands dirty and some experience with it. We are working on a project consisting in reproducing the RNAseq pipeline of a research team on another dataset.

We would like to move from tophat2 (used by the team we work for) to hisat2. We are both interested in this because when multithreading on a computer farm, we get a "[failed]" outcome when "writting tophat reports" and also because hisat2 is much more efficient and we would like to learn using this new software. Moreover, reproducing the pipeline strategy on another software could further strengthen its "proof of concept".

However, we face difficulties setting up parameters.

It would be very kind of you, if you could give us some guidance on how to reproduce these tophat2 parameters on hisat2:

(all other parameters as default):

“--min-intron-length 10 --max-intron-length 20000 --read-mismatches 3 -- read-gap-length 2 --read-edit-dist 3 --max-multihits 2 --b2-sensitive --segment-mismatches 2 -- segment-length 15 --min-segment-intron 10 --max-segment-intron 20000 --no-coverage-search”.

Another run using:

“--read-gap-length 1000 --read-edit-dist 1003 --b2-ma 3 --b2-rdg 3,1”

I understand that it is a bit much to ask, but that is an obstacle ( in a very early step of the pipeline). Hisat2 parameters are very cryptic for us yet. So if you could even just explain some underlying concepts that could help us do it ourselves it would be very nice!

Thanks in advance

ADD COMMENTlink modified 5 days ago by Istvan Albert ♦♦ 71k • written 6 days ago by pierre.ortalo0

Take a look at Simulation-based comprehensive benchmarking of RNA-seq aligners and see if it helps (indirectly).

ADD REPLYlink modified 5 days ago by Istvan Albert ♦♦ 71k • written 6 days ago by genomax29k

As a general rule of thumb for most bioinformatics tools: the default settings should be reasonable for standard situations. Only when your dataset is "different" you can start fiddling around with parameters.

ADD REPLYlink written 6 days ago by WouterDeCoster19k
gravatar for Istvan Albert
5 days ago by
Istvan Albert ♦♦ 71k
University Park, USA
Istvan Albert ♦♦ 71k wrote:

It appears that most of the parameters that you are asking about are in the histat2 manual:

--min-intronlen <int>
--max-intronlen <int>
--rdg <int1>,<int2>

etc. So finding out what stayed the same would simplify your question. Then some parameters probably don't apply since internally the algorithm has changed.

In general, I would not try to force one aligner to work exactly the same way as another. In addition, I would be very cautious setting these many parameters. Users are often under the impression that tools work exactly as described and that they understand what parameters do and how they interact.

In my opinion, this is rarely the case - not even the developer of the software may fully understand the many ways these parameters interact (no to mention the unexpected effects due to the order by which the various conditions are applied).

If your reporting crashes it is not because the aligner did not work the "right" way - it is because your reporting relies on features it should not.

Take comfort in believing what the authors state, that HiSat2 is a more efficient and better aligner than TopHat2 so you probably don't even need all those settings.

ADD COMMENTlink modified 5 days ago • written 5 days ago by Istvan Albert ♦♦ 71k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour