Question

Noise Analysis for RNAseq Data

0

Entering edit mode

5.9 years ago

biocool2018 ▴ 30

Hi All,

I have two RNA isolation techniques (I1, I2) for a specific experiment that I would like to compare which one outputs a more reproducible sequencing data. I have three seperate rna seq runs using both isolation techniques, so three replicates each condition. What I had in my mind is to fit a dispersion curve either using voom or the estimate dispersion function of Deseq2 specific to each condition and see which curve lies on top of the other one. Does this approach makes sense? If not do you have any other suggestions? IS there a quantity that I can use to quantify dispersion a single number that I can get from the dispersion curve? Any help is appreciated.

Thanks

RNA-Seq Deseq2 limma voom • 1.8k views

ADD COMMENT • link updated 5.9 years ago by Friederike 8.9k • written 5.9 years ago by biocool2018 ▴ 30

0

Entering edit mode

Some pointers:

Well you need numbers/replicates first, for each techniques to interpret errors/biases/noises. Do you have that?
If so then look for density plot of the data and then go for your threshold to remove genes with zero or low counts depending upon the distribution.
Once that is done take care of the normalization (cpm , tmm, ). Look into the boxplot plots before and after normalization. Pretty much sums up for the noise and helps identify if there is noise or you need some other normalization or not.
Well on these before and after normalization you should also perform orthogonal plots like MDS/PCA as put in the other query. Helps in outlier detection but also pretty much finds out most representative variability if present be it in line with biological or technical or simply noise.
Finally I reckon you should have some hosuekeeping genes for your techniques or estimation of the ERCC spike-ins if thats in your experimental design. Normalized plots of them should be informative enough to interpret if there is noise or not. I believe all these should be good enough.

@Friederike already provided you with some pointers and add mine to see what your data say.

P.S.: If you find answers/comment helpful vote/bookmark the thread to be useful for others.

ADD REPLY • link 5.9 years ago by ivivek_ngs ★ 5.2k

score 0 · Answer 1 · 2018-05-18

0

Entering edit mode

5.9 years ago

Friederike 8.9k

Sounds like a good first step. PCA/MDS plots should also give you a good idea about the variability between individual samples (as well as the sample groups). Another global view could come from calculating the pairwise distances between the different samples and see what those look like (e.g., is the correlation between the replicates of one condition higher than for the other condition -- don't be too put off if those differences turn out to be marginal though)

I'm also guessing that you have a couple of favorite genes, so looking at the rlog-transformed or TPM values of those may also be insightful, e.g. via heatmaps where every individual sample gets its own column.

I would probably also try to have a look at the most variable genes per condition and see if there are certain patterns, e.g. maybe I1 generates very reproducible results for short genes whereas I2 is more reliable for long genes (just a toy example).

Generally, your analyses should be guided by the ultimate questions/reasons that led you to think that either one technique might be superior.

ADD COMMENT • link 5.9 years ago by Friederike 8.9k

0

Entering edit mode

These are good points thanks for them. At this stage of analysis I am not interested in a particular gene, but I need a unique number that somehow summarizes the mean variance curve. Obviously, the first idea that comes is the integral or AUC but I wonder whether there is any more meaningful or principled quantity that statisticians use.

ADD REPLY • link 5.9 years ago by biocool2018 ▴ 30

0

Entering edit mode

what do you need that unique number for and why would it be more useful than the actual curves (that may contain more information)?

ADD REPLY • link 5.9 years ago by Friederike 8.9k

0

Entering edit mode

what do you mean by unique number? Are you referring to mean-variance trend plot and then calculating the scaling factor based on your normalizing factor?

ADD REPLY • link 5.9 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

If the data were not heteroskedastic and variance did not depend on mean I would do an F-test to compare variances to see whether they are statistically different. Right now I have two curves showing the mean variance trend what statistics to use to show one curve is bigger that the other. Does this make sense?

ADD REPLY • link 5.9 years ago by biocool2018 ▴ 30