Question: DESeq2 (or EdgeR) Exploratory Analysis with no Replicates
gravatar for gabriel.jabud
10 months ago by
United States
gabriel.jabud30 wrote:

My pipeline so far is hisat2->featureCounts->DESeq2. I have generated heatmaps after rlog and log2 transformation of the genes with the most variance, which is somewhat meaningful. What I really want to do is compare everything to the control sample and take the genes with the most log fold change in either direction. I've read through the DESeq2 vignette and haven't found a good example of that. Maybe I do this under the design parameter when running DESeqDataSetFromMatrix()? So far I've only set the design parameter to ~condition as I'm a little shaky on how that parameter works.

Maybe this is more of an R problem than a DESeq2 one? Is EdgeR the better tool since it allows you to do some analysis with no biological replicates by setting the dispersion value?

rna-seq edger deseq2 • 1.3k views
ADD COMMENTlink modified 6 months ago by Konstantinos Yeles100 • written 10 months ago by gabriel.jabud30

You can do a DESeq2 analysis with no replicates, the stats are just essentially meaningless. As they would be for any other tool or package trying to compare RNA-seq between single samples.

ADD REPLYlink written 10 months ago by jared.andrews075.3k

Yes so I can make heatmaps from the log normalized counts and do things like PCA (and I have). My question is more about what other analysis I can do and how I can compare everything to the control sample in DESeq2. For example, say I want a list of most differentially expressed genes vs control sample, starting with featureCounts matrix which I've imported. Currently I'm not comparing everything to the control, but to each other. So I can get the list of genes with most variance with something like:

topVarGenes <- head(order(-rowVars(assay(rld))),20)
mat <- assay(rld)[ topVarGenes, ]
mat <- mat - rowMeans(mat)
pheatmap(mat, show_rownames=TRUE, cluster_cols=FALSE)

but that's not as meaningful as the genes that are most different from control.

Running results(dds) on the data actually gives an error that DESeq2 no longer supports experiments with only one replicate, so I don't get the nice summary that a well designed experiment would give.

ADD REPLYlink modified 10 months ago • written 10 months ago by gabriel.jabud30

Please use the search function and read through what you can find on the BioC support page and google. I understand it is frustrating to analyse underpowered/unpowered experiments but this question really has been discussed like a hundred times before. Please go through the previous contents and see what you can take away from it. Don't be surprised if this question gets closed by a different moderator for the aforementioned reason.

ADD REPLYlink written 10 months ago by ATpoint32k

Do you have a particular thread in mind? I have looked at all those pages pretty extensively and none really cover what I'm looking for.

ADD REPLYlink written 10 months ago by gabriel.jabud30

As ATpoint highlights, there is a lot of material / discussion out there. Just search via your search engine of choice. For one, there is the EdgeR manual (see '2.11 What to do if you have no replicates'):

edgeR is primarily intended for use with data including biological replication. Nevertheless, RNA-Seq and ChIP-Seq are still expensive technologies, so it sometimes happens that only one library can be created for each treatment condition. In these cases there are no replicate libraries from which to estimate biological variability. In this situation, the data analyst is faced with the following choices, none of which are ideal. We do not recommend any of these choices as a satisfactory alternative for biological replication. Rather, they are the best that can be done at the analysis stage, and options 2–4 may be better than assuming that biological variability is absent.

As for other ideas other than heatmaps, etc., I am going to put a question back to you: why did you do the experiment in the first place if you did not even know the analysis plan that was going to be carried out? Perhaps I missed this somewhere in your original question (?) Would running a few cDNA micorarrays not have been better?

ADD REPLYlink modified 10 months ago • written 10 months ago by Kevin Blighe56k

I didn't design the experiment, I inherited the data from a previous researcher and want to make use of it. I did read the edgeR manual and I will try to generate useful figures from that next. I guess I should have restated my original question as "How do I view logfold changes vs a control sample with no replicates using DESeq2", it seems like people are misinterpreting my original question.

ADD REPLYlink written 10 months ago by gabriel.jabud30

Perhaps we are mis-interpreting it; however, I, personally, want to put a stop to the propagation of 'noise' in research. Poor experimental design is one of the key reasons why so many published works that research the same thing are not reproducible.

ADD REPLYlink written 10 months ago by Kevin Blighe56k
gravatar for Konstantinos Yeles
6 months ago by
Konstantinos Yeles100 wrote:

In addition to the EdgeR manual, you could use the NOISeq package that has a function precisely for cases without biological replication. manual: chapter 5.1.2 NOISeq-sim: no replicates available

Good luck with the analysis.

ADD COMMENTlink written 6 months ago by Konstantinos Yeles100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 843 users visited in the last hour