Question: Correlation of read count values from STAR and Kallisto
6
gravatar for parashar.dhapola
3.7 years ago by
United States
parashar.dhapola150 wrote:

Hi,

I'm analyzing a RNA-Seq dataset if yeast(GSE71712) using STAR and Kallisto.

correlation • 2.8k views
ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by parashar.dhapola150
1

Kallisto has been highly anticipated, even though it hasn't been published yet. These comparisons you are making are very important and pretty much needed for validation. It doesn't help if Kallisto was 1000x faster but the counts would not correlate well. But the counting function in STAR is also quite new, so I would compare the counts to another method using e.g. htseqcount or easyRNAseq in R and see how that works out.

ADD REPLYlink written 3.7 years ago by Michael Dondrup46k

Thanks for the reply Michael,

I'm trying to use RSEM downstream of STAR. I don't want to take a naive read count approach (like HTSeq/Feature Count/easyRNA). I'm looking forward carrying out isoform based DE calling (EBSeq downstream of RSEM+STAR and Sleuth downstream of Kallisto).

ADD REPLYlink written 3.7 years ago by parashar.dhapola150
1

I haven't yet tried out Kallisto, but I wonder how much of an effect will this really have on down-stream DE analysis. Just because Kalloisto counts is really different from STAR doesn't actually necessarily mean the downstream DE will be vastly different. Unless you are trying to perform DE on genes within one sample, which I don't think is very valid anyways. Whatever biases introduced by Kallisto or STAR might be consistent among the samples you are comparing, or it might not.  

I recommend try this on two datasets, perform the DE, and then maybe look at the correlation of fold-changes. 

If that shows good correlation, then I think the differences will probably just come down to the genes with many multi-mapped reads (conserved domain, isoforms...) and how Kallisto/STAR deals with that.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Damian Kao15k

Dear Damian,

Thanks for your reply.

I' have got replicates and other samples in this datset. I'll share the results of those too. However, I'm not so sure about using any DE tool to make comparison. For example, DESeq is known to not play very well with Kallisto output. Hence, the compatibility of the Star/Kallisto with DE tool might itself introduce some biases. Nevertheless, it is worth a try and will surely get back to you with the results.

ADD REPLYlink written 3.7 years ago by parashar.dhapola150

I would do a comparison of STAR's GeneCount vs a dedicated DE tool like EdgeR or DESeq on STAR's output. They should be the same right, given the same input data - but since I've seen so many Kallisto vs Salmon vs Tophat comparisons and no one has ever mentioned a difference in distribution before, I would suspect STAR's GeneCount over Kallisto (as much as I love STAR).

Great post though - and thank you for taking the time to show us this graphic :)

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by John12k
1

As the developer of Sailfish and Salmon, I've done quite a bit of comparison against STAR counts at the gene level.  While you will see (sometimes systematic) differences, I've never seen anything this stark.  Further, given the similarities between Sailfish and Kallisto, by transitivity, I wouldn't expect to see such a tremendous difference between those methods.  Could you provide a bit more detail about how you've computed these results?  That is what transcriptome did you use for Kallisto, how did you aggregate the counts to the gene level etc.?  Typically, we see (spearman) correlations in the high 0.8's to the mid 0.9's between Sailfish or Salmon and STAR at the gene level --- I'd expect something similar from Kallisto.

ADD REPLYlink written 3.7 years ago by Rob3.6k

Dear Rob,

I obtained annotation data from Ensembl release 83 (FTP Link).

I wrote a small piece of code (it was an overkill cause I tried to make a more Generic GTF parser, but it did its job right). You can review the code here:

I'm trying to see how Salmon performs on this dataset next. I welcome your further comments.

ADD REPLYlink modified 14 months ago by RamRS24k • written 3.7 years ago by parashar.dhapola150

Interesting; any reason to not use the existing cdna file? Is the experiment you're sequencing public? (I see this is in the original post ;P). I'll be interested to take a look.

ADD REPLYlink modified 14 months ago by RamRS24k • written 3.7 years ago by Rob3.6k
1

Check out the latest update. It is quite interesting: Salmon actually has very high correlation with STAR+RSEM (r=0.98). You were definitely right about your tests. But this brings me back to original question: whats wrong with read count values from Kallisto?

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by parashar.dhapola150
1

Thank for reply John,

As you can see from updated post STAR counting strategy is not so bad after all. Bt this is nowhere near conclusive. I wish to see how Salmon performs here.

ADD REPLYlink written 3.7 years ago by parashar.dhapola150

Hello parashar.dhapola!

We believe that this post does not fit the main topic of this site.

Post closed

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 14 months ago by RamRS24k • written 3.7 years ago by parashar.dhapola150

Why did you close this (and apparently remove most of the content)?

ADD REPLYlink written 3.7 years ago by Devon Ryan92k

I was wondering the same thing.  Was the post closed by the original poster, or, considering the fairly cryptic final message, the admins?  And, yes, what happened to all of the content?!

Update:  For posterity (or in case the OP wants to re-start the discussion) --- here is the content of the main post at the time it was closed: (see below; since it's not clear exactly why the original post was closed and all of the content removed, I'm removing the below unless OP requests it).

 

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Rob3.6k
1

re-opening.

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum124k

If the thread author wants to remove it, that's OK no? I know its not exactly great, but I would have thought their wishes would be most important.

Moreover, they are probably closing it because they discovered that the weird Kallisto result was due to a little user-error (different annotation file or input file being used, etc ) and just wanted the whole thing closed so as to not waste anyone else's time. Speaking as an expert on the subject, they were probably embarrassed....

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by John12k
2

I agree, but in that case, OP should say so (e.g. via an update at the top of the message with the fixed result or some such).  The last message before the post was originally closed is very cryptic, and suggests that it was closed by mods (even though that doesn't seem to be the case).  EDIT:  In light of John's interpretation, I'm removing the the content of the original post from my status until / unless OP requests it be put back.

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by Rob3.6k
1

Agreed. If you're out there parashar, please let us know what happened so we can help other users in the future who run into the same issue :)
User-errors are far more common (and difficult to identify) than program errors - so dissecting them is non-trivial :)

ADD REPLYlink written 3.7 years ago by John12k

The author of the closing message is the OP. We do allow people to close their own post but once closed they cannot reopen it, only mods can. He might have been playing around with the options...

ADD REPLYlink written 3.7 years ago by Istvan Albert ♦♦ 81k
0
gravatar for Rob
3.7 years ago by
Rob3.6k
United States
Rob3.6k wrote:

Made this a comment above rather than an "answer"; since it is not.

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Rob3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2240 users visited in the last hour