Question

Normalization of RNA-seq data with Kallisto/Sleuth for comparisons across treatments

0

Entering edit mode

3.0 years ago

a.sugi30 • 0

I am running an RNA-sequencing experiment where I am analyzing the differential gene expression of oysters collected from different locations. I plan on using the Cyverse DNA Subway Greenline platform which utilizes Kallisto and Sleuth. Since I will be conducting multiple comparisons (ie oysters from Site 1 vs oysters from Site 2 vs oysters Site 3 etc.), I understand that this could run into significant statistical issues involving inferential and individual variation of each sample. Will the Kallisto and Sleuth algorithms correct for this? I imagine I will need to run all of my samples simultaneously through Kallisto so that normalization is done across all samples. Will this be sufficient to mitigate the noise from individual sample variation and make biological variation more significant? Or would I need to employ normalization methods such as TMM via edgeR? I am pretty new to this and learning along the way so any feedback is much appreciated!

Thanks in advance.

Kallisto Normalization RNA-seq TPM Sleuth • 2.1k views

ADD COMMENT • link updated 3.0 years ago by Ram 43k • written 3.0 years ago by a.sugi30 • 0

Ram · Accepted Answer · 2021-05-06

If, and I know this is a big IF, money were no object, it would be good to use biological replicates from each "site". that is build 3 RNA-seq libraries from 3 RNA extractions from a particular site. when you say biological variation do you mean "significant differences in transcript level due to site?". just to clarify because people use terms like "technical" and "biological" for replicates as well as I am when referring to replicates of your RNA. I am assuming so....but you know what they say about "assume".

How will you isolate RNA? from individuals or from populations? whole organism or certain tissue or organ or lifestage specific? lots of variables to think about. they will all contribute to the variation that you are worried about. you would want to pick similar "sample" types for RNA isolation for each biological replicate at each site.

might be able to get away with duplicates for each "site" but without knowing the inherent variability of wild oyster transcriptomes, and the "source" of RNA you will choose for each "site", its hard to say if 2 would get you the statistics you need.

Of course, depending on your ultimate aim, people do 1X1 sample comparison for differentially expressed gene analyses (DEG) but most software wants duplicates. I have used GFOLD for 1X1comparisons, following Example 3 at https://zhanglab.tongji.edu.cn/softwares/GFOLD/index.html you just want to be sure to confirm individual transcripts with something like rt-pcr on multiple samples before getting excited about any DEGs found with this method.

score 2 · Accepted Answer · 2021-05-06

When you run kallisto quant, just make sure you specify the bootstrap option via -b.

You don't need to worry about normalization when running kallisto because kallisto only does quantification to get you the counts; it doesn't do any between-sample normalization.

Sleuth (after you run kallisto) will automatically handle all the appropriate normalizations for you.