analysing RNASeq with spike-In
1
1
Entering edit mode
6.7 years ago
Assa Yeroslaviz ★ 1.6k

Hi,

we're working on a RNA-Seq data set, where we used spike In to control for transcriptional amplification. After running the analyses in the standard way and not getting too many significantly regulated genes (~70), we adjusted the normalization according to the spike Ins and got over 2000 genes with significant adj. p-value.

We first calculated the size factors for the Spike In subset and gave it to the DESeq object before running the analysis.

I have a question about the number of reads we are getting after the ercc normalization.

when calculating the sum of reads before and after the ercc normalization we are getting the following numbers

sample         before          after ctrl1     24762847.36    33353900.56 ctrl2     24973604.51    30803080.63 ctrl3     24727427.27     22561350.5 treat1    24875474.53    18880174.62 treat2    24911409.33    23560508.28 treat3       24588778    21308227.81

as you can see there is a clear trend of less reads in the treated samples after taking the spike Ins into account and correcting for the transcriptional bias. This goes perfectly in accord with our expectations.

Our assumption is that the reason for that is that there are less polyA transcripts in the treated data set (an assumption which we would like to verify).

therefore I have two questions -

Are we correct in assuming that the changes in the number of reads is might to less polyA transcripts in the treated samples?

Secondly, is there a statistical way to calculate this diffenrence/change in numbers and give it some kind of a statistical value to it ,so that we can say how significant these results are?

thanks

Assa

spike-In normalization DESeq DE genes • 3.7k views
0
Entering edit mode

unfortunately it is not from single cell experiments, but a standard RNA-Seq protocol. But the interesting part is surprisingly, that the bulk part of the DE genes are in one direction. Most of them are down-regulated.

Even though it was one of the predictions for the experiment, this is also one of the reasons, we assume that there really is a transcriptional amplification

0
Entering edit mode

Given that it makes biological sense it's probably real. I wouldn't worry too much about it being from bulk tissue, you'll just have to address some of the issues I mentioned to get it past reviewers. Anyway, it looks like you have some neat biology on your hands, best of luck!

2
Entering edit mode
6.7 years ago

Changes in polyA levels (whether that's due to transcription or turn-over is a different question (enjoy doing the pulse-chase experiment)) are certainly one possibility. There are, of course, others. If this is RNAseq from bulk tissue, then you have to worry about things like ensuring you had highly comparable amounts of starting tissue (in terms of weight, cell number, cell volume, etc.) and that the extraction efficiency is comparable between conditions. If these are single-cell experiments (you should probably have more replicates and have used/counted unique molecular identifiers!), you still have issues of cell size/shape. Afterall, talking about transcriptional amplification is less interesting if it's just because the cells are huge (or tiny, for the opposite).

Regarding a statistical test, you could just do a T-test between the summed normalized counts. That's less telling (and not significant, given the N of 3) than mentioning ~2000 DE genes, but it's a single number at least. I would suggest looking at the balance of the direction of change of the DE genes. If you really do have a difference due to transcriptional amplification, then the fold-changes should be almost entirely in a single direction.