Question: Kallisto-Sleuth or Kallisto-Deseq2?
gravatar for Mozart
2.7 years ago by
Mozart190 wrote:

Hello everyone, I am using Kallisto-Sleuth at the very end of my pipeline in the RNA seq analysis. I'm trying to find on the web (as a newbie) all the reasons in favour of my choice; I would like to ask why choosing Kallisto and Slueth at the end of my pipeline would be a better choice than Deseq2..

rna-seq • 3.0k views
ADD COMMENTlink modified 9 days ago by jocelyn.petitto10 • written 2.7 years ago by Mozart190

Sounds like you made that choice without recognizing difference between alignment and mapping.

ADD REPLYlink written 2.7 years ago by genomax85k
gravatar for Istvan Albert
2.7 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

Kallisto is not an alternative to deseq2.

Kallisto does the quantification (assigns reads to transcripts). You can run deseq2 on the effective counts output of kallisto (after rounding these counts to integers).

Sleuth is the "alternative" to deseq2.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Istvan Albert ♦♦ 84k

thanks and sorry if I am a bit confused about this; which pipeline is the best among the following ones: fastq file-->STAR-->bam file-->HTSEQCOUNT-->.txt-->DESEQ2/PCA/DE or fastq file-->KALLISTO-->SLEUTH-->differential expression analysis

or do you prefer any other different approach? finally I am not sure I have understood how to define the best "formula" can you say "this is a best solution"? please help me and sorry for my English

ADD REPLYlink written 2.7 years ago by Mozart190

There is no gold standard that outperforms the others. Both of the pipes that you mention are valid and perform well. For the end user (and this is just my personal opinion) it all comes down to which pipeline you feel most comfortable with, in terms of documentation and handling. Both DESeq2, Sleuth, as well as the other accepted approaches like edgeR do work, are accepted and well tested. In my experience, end users do use what they first got in contact with, either by searching around, reading blogs or what the instructor used in your first RNA-seq workshop. If you read the blogs of the biostatisticians who develop these tools, even they acknowledge that each approach has its own qualification, without that it outperforms others (see here for an example). So choose whatever you feel most comfortable with, follow the instructions in the manuals and vignettes, and get on with your analysis.

ADD REPLYlink written 2.7 years ago by ATpoint36k

Well said. Keeping in mind that the hypotheses generated need to be independently experimentally verified in any case.

ADD REPLYlink written 2.7 years ago by genomax85k

thank you for your replies; I come from a different background and I am just looking at these tools for the first time. Genomax how can I verify the hypotheses? If I don't misread what you said, you are basically state that after 'running' my pipeline I have to verify experimentally the result (so this further assessment should be done in silico, as well?); is there any way to validate my results prior to go on 'the bench'?

Secondly, is it a good idea to use kallisto as input for DESeq2 or maybe it is better to use deseq2 with "the STAR" pipeline? Or probably it could be worth doing fastq file-->KALLISTO-->DESeq2 & fastq file-->KALLISTO-->Sleuth?

ADD REPLYlink written 2.7 years ago by Mozart190

is there any way to validate my results prior to go on 'the bench'?

As long as the biological differences are prominent in your dataset results, you get from DESeq2 or Sleuth should be reasonably similar. Someone with domain knowledge of the experiment (if that is not you) should be able to look at the results and get an idea if the results make a story. They will also be able to decide which genes can be selected for further experimental validation.

ADD REPLYlink written 2.7 years ago by genomax85k

Thanks for both replies, would it be a good idea carrying out two parallel workflows in order to double check the results?

ADD REPLYlink written 2.7 years ago by Mozart190

If it's your own curiosity, then you can run two workflows - it could be seen as a training exercise.

As an example, though, I once ran the following analyses on the same data:

  1. Tophat2 --> raw count abundance with BEDTools (a 'hack') --> DESeq2
  2. Kallisto --> DESeq2

...and got the same results where it mattered.

As genomax says, "as long as the biological differences are prominent in your dataset results", you should get the same results from each of the standard/accepted methods.

ADD REPLYlink written 2.7 years ago by Kevin Blighe61k

wow you clarified a lot of things! thank you very very much can you please help me with another question I put down below?

ADD REPLYlink written 2.7 years ago by Mozart190
gravatar for jocelyn.petitto
9 days ago by
jocelyn.petitto10 wrote:

I think it is helpful to have more than one pipeline in mind because, as I have run into, sometimes a tool just will not work. I can't open the kallisto created h5 files with sleuth because of the environment in which they are created and have yet to figure out how to change it so it will work. Note: NOT asking that as a question here, simply saying that with these tools it is also a good idea to know your options because sometimes they don't work and you need results by X day without necessarily the time to fix the issue.

ADD COMMENTlink written 9 days ago by jocelyn.petitto10

I strongly discourage to accept that things just not work. Switching tools if an issue comes up is not how one can manage ongoing projects. Better invest some time investigating the course of the issue. There must be a reason and maybe it is something the user did unconsciously by changing some environmental parameters, but especially then reproducibility demands that you sniff it out. Imagine you have an ongoing project with tons of data based on a certain pipeline, and then you run into an issue with new data. Are you going to re-run everything with a new pipeline? What if this new pipelines has some issues as well at some point? If you need help (since yu posted this also on another thread) then please add details in that other thread as Kevin asked already. SHow details and code please (yes I realize you did not post this as a question here).

ADD REPLYlink written 7 days ago by ATpoint36k

I don't think it is fair to criticize the answer for the suggestion that sometimes we need to accept that tools do not work.

Implying that a tool's failure to operate is necessarily the end-users fault may be occasionally true, mostly for newcomers but I really doubt there is any bioinformatician out there that has never given up on a pipeline because it simply did not work as advertised. It is a reality that we ought to accept.

In general, being able to analyze data with different pipelines is excellent advice. Running the same data through two different pipelines will teach people more about reproducibility, systematic errors, subjective parameter settings etc than any formal class work.

ADD REPLYlink modified 6 days ago • written 6 days ago by Istvan Albert ♦♦ 84k

Thank you. I strongly discourage assuming that I did not investigate the issue or try to determine how to fix said issue. It is an ongoing issue with sleuth reading kallisto produced h5 files, which is well established. Having read quite a few of the issues re: the same problem logged on the kallisto git and considered what options I had versus my working knowledge of the cluster on which I am working and the module versions currently available there as well as seeing that the issue could come down to what type of computer the process was run on, I decided to move on. The time invest in solving this specific issue versus getting ready for the qualifying exam that I needed to take (passed) for which this would have been preliminary data, I decided it was more "cost effective" to investigate other pipelines after my qualifier (the differential analysis was more an "icing on the cake" for my QE rather than necessary). From my experience as someone who can easily get in to deep solving problems that could be better approached a different way, coming to the conclusion that I should go to a different pipelines was an important (and significant) one for me. Therefore I shared it as a reality of doing this type of work. Honestly, how many pieces of pipelines are written by a graduate student and not kept up? (Not the case with kallisto and sleuth, but in the bigger picture.) Subsequently, I have discovered I can have the bootstrap analysis output as text files and now I am curious to know if I can use another method to take those and turn them into an h5 that sleuth can read. However, it isn't particularly important to me at this juncture that I figure this out because I have other options. I agree with Istavan that it is important to be able to use multiple methods and compare results. From what I gather, that approach is expected, particularly in scRNAseq (not what I was trying to do, but is a future direction for me), where you have to weigh dropout versus modality.

ADD REPLYlink written 6 hours ago by jocelyn.petitto10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 971 users visited in the last hour