Question

How is the bootstrapping information from kallisto treated in DeSeq2 when imported using tximport?

0

Entering edit mode

4.0 years ago

divya.nandakumar ▴ 30

I have transcript abundances from kallisto run with 100 bootstraps. My understanding is the bootstrapping gives information about the variability in the abundance estimate. If I use tximport to import this abundance information for use in deseq2, is the variance information from bootstrapping used by deseq in any way or does deseq calculate the variance in a different way?

I see in the tximport manual that there is a way to import the inferential replicate values by setting txOut=TRUE and varReduce to summarize the inferential replicates in to one variance value per transcript. But is this information used by DeSeq2 in anyway during the diff expression analysis?

Also, does RSEM perform any variance calculation for the estimated counts?

Background: I am trying to compare kallisto -> sleuth with featureCounts -> DeSeq2. kallisto followed by sleuth shows no significantly differentially expressed genes (at transcript or gene level) while featureCounts -> DeSeq2 shows several genes that are differentially expressed. To know if this is an effect of having the variance data, I wanted to try running the kallisto transcript abundances in Deseq2.

RNA-Seq kallisto deseq2 tximport • 2.8k views

ADD COMMENT • link updated 4.0 years ago by ATpoint 82k • written 4.0 years ago by divya.nandakumar ▴ 30

0

Entering edit mode

4.0 years ago

ATpoint 82k

Edit (14th Sep 2020), see the answer of the tximport maintainer Michael Love that just came in. The tximport vignette was outdated towards the inferential replicate support, therefore my answer (based on the vignette) does not apply. I will manually toggle Mike's answer as accepted to ensure it appears as first answer.

Summary: tximport does support inf. reps. from kallisto since August 2018 which should be version 1.9.9 I think. The relevant line in the vignette is here.

No, tximport does not import bootstrap information from kallisto when summarizing to the gene level, see the tximport manual where this is clearly stated: https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#kallisto

Because the kallisto_boot directory also has inferential replicate information, it was imported as well (and because txOut=TRUE). As with Salmon, inferential replicate information will not be summarized to the gene level.

I think your comparison is not informative since you are comparing two different quantification methods (kallisto => pseudoalignment, featureCounts => traditional alignment quantification) and on top of that two different statistical frameworks. For a meaningful comparison keep either quantification or downstream statistics constant. Currently the differences could simply be based on the quantification method. If you get no DEGs then better check if you have enough power and samples are of good quality, samples have sufficient depth and cluster well in a PCA. Maybe, and this is always an option, the biological truth is that there are no DEGs at all.

ADD COMMENT • link 3.6 years ago by ATpoint 82k

0

Entering edit mode

Thank you for replying. I saw that statement in the manual but that it meant that the information will not be used for gene-level analysis but will still apply if I was to look at Diff expr at the transcript level. What is the purpose of the varReduce argument when importing the data?

I understand they are completely different methods of analysis. Based on other experimental data (qPCR, microarray), we know that there are differentially expressed genes (the mutant is of a transcription factor) and DEGs from deseq are consistent with what we would expect. It is also quite odd that PCA from the kallisto data showed poor separation of samples (particularly for one replicate), while PCA plot from featureCounts + DeSeq showed substantial separation of samples along one axis. I was wondering whether the bootstrapping was bringing out any underlying problems between the replicates. Kallisto-sleuth would be more convenient to use merely because of the speed of the analysis and I was trying to see if it is comparable to deseq.

ADD REPLY • link 4.0 years ago by divya.nandakumar ▴ 30

0

Entering edit mode

If one method does not agree with your expectations which have confirmation by other methods then do not use it, right? There are alternatives such as salmon if you want a lightweight quantifier. Salmon offers several handy features such as GC and sequence bias correction plus is now able to use decoy sequences and selective alignment to improve accuracy.

ADD REPLY • link 4.0 years ago by ATpoint 82k

0

Entering edit mode

Please see the updated answers.

ADD REPLY • link 3.6 years ago by ATpoint 82k

score 1 · Accepted Answer · 2020-09-14

Apologies -- the tximport vignette was out of date and I didn't realize until just now. We added inferential replicate import and/or inferential variance calculation for gene-level into tximport in August 2018. But I forgot that these sentences were in the vignette sections. I've just now updated the text.

In the Swish paper, we performed gene-level analysis with inferential replicates/variance used by sleuth and Swish. Because there is less gene-level inferential uncertainty for typical bulk RNA-seq compared to transcript-level inferential uncertainty, it doesn't make as much of a difference. However, for 3' tagged RNA-seq there is still substantial gene-level uncertainty.