Question: Noise Analysis for RNAseq Data
gravatar for biocool2018
21 months ago by
biocool201820 wrote:

Hi All,

I have two RNA isolation techniques (I1, I2) for a specific experiment that I would like to compare which one outputs a more reproducible sequencing data. I have three seperate rna seq runs using both isolation techniques, so three replicates each condition. What I had in my mind is to fit a dispersion curve either using voom or the estimate dispersion function of Deseq2 specific to each condition and see which curve lies on top of the other one. Does this approach makes sense? If not do you have any other suggestions? IS there a quantity that I can use to quantify dispersion a single number that I can get from the dispersion curve? Any help is appreciated.


voom rna-seq limma deseq2 • 760 views
ADD COMMENTlink modified 21 months ago by Friederike5.2k • written 21 months ago by biocool201820

Some pointers:

  1. Well you need numbers/replicates first, for each techniques to interpret errors/biases/noises. Do you have that?
  2. If so then look for density plot of the data and then go for your threshold to remove genes with zero or low counts depending upon the distribution.
  3. Once that is done take care of the normalization (cpm , tmm, ). Look into the boxplot plots before and after normalization. Pretty much sums up for the noise and helps identify if there is noise or you need some other normalization or not.
  4. Well on these before and after normalization you should also perform orthogonal plots like MDS/PCA as put in the other query. Helps in outlier detection but also pretty much finds out most representative variability if present be it in line with biological or technical or simply noise.
  5. Finally I reckon you should have some hosuekeeping genes for your techniques or estimation of the ERCC spike-ins if thats in your experimental design. Normalized plots of them should be informative enough to interpret if there is noise or not. I believe all these should be good enough.

@Friederike already provided you with some pointers and add mine to see what your data say.

P.S.: If you find answers/comment helpful vote/bookmark the thread to be useful for others.

ADD REPLYlink modified 21 months ago • written 21 months ago by ivivek_ngs4.9k
gravatar for Friederike
21 months ago by
United States
Friederike5.2k wrote:

Sounds like a good first step. PCA/MDS plots should also give you a good idea about the variability between individual samples (as well as the sample groups). Another global view could come from calculating the pairwise distances between the different samples and see what those look like (e.g., is the correlation between the replicates of one condition higher than for the other condition -- don't be too put off if those differences turn out to be marginal though)

I'm also guessing that you have a couple of favorite genes, so looking at the rlog-transformed or TPM values of those may also be insightful, e.g. via heatmaps where every individual sample gets its own column.

I would probably also try to have a look at the most variable genes per condition and see if there are certain patterns, e.g. maybe I1 generates very reproducible results for short genes whereas I2 is more reliable for long genes (just a toy example).

Generally, your analyses should be guided by the ultimate questions/reasons that led you to think that either one technique might be superior.

ADD COMMENTlink written 21 months ago by Friederike5.2k

These are good points thanks for them. At this stage of analysis I am not interested in a particular gene, but I need a unique number that somehow summarizes the mean variance curve. Obviously, the first idea that comes is the integral or AUC but I wonder whether there is any more meaningful or principled quantity that statisticians use.

ADD REPLYlink modified 21 months ago • written 21 months ago by biocool201820

what do you need that unique number for and why would it be more useful than the actual curves (that may contain more information)?

ADD REPLYlink written 21 months ago by Friederike5.2k

what do you mean by unique number? Are you referring to mean-variance trend plot and then calculating the scaling factor based on your normalizing factor?

ADD REPLYlink written 21 months ago by ivivek_ngs4.9k

If the data were not heteroskedastic and variance did not depend on mean I would do an F-test to compare variances to see whether they are statistically different. Right now I have two curves showing the mean variance trend what statistics to use to show one curve is bigger that the other. Does this make sense?

ADD REPLYlink modified 21 months ago • written 21 months ago by biocool201820
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1290 users visited in the last hour