Question: Analyzing RNA-seq without replicates
gravatar for shawn.w.foley
2.9 years ago by
shawn.w.foley1.2k wrote:


I'm currently analyzing an RNA-seq experiment consisting of clinical patient samples pre- and post-treatment, for individuals that had no response (NR, n=6), partial response (PR, n=4), or complete response (CR, n=2) to our compound. Unfortunately, no replicates were collected for each individual patient, but we're doing the best we can with these samples. The goal is hypothesis generation for downstream validation. Our main questions are:

1) Which genes consistently change expression after treatment?

2) Which genes change specifically in CR/PR patients and are unchanged in NR patients after treatment?

I'm trying to determine the best way to analyze these data with these limited resources. I've analyzed the pre- and post-treatment samples with CuffDiff and DESeq2, and have markedly different results. I'm currently trying to analyze them with IsoEM2/IsoDE2 as these perform bootstrapping to report confidence intervals and were designed for an experiment without replicates. Do you have any insight on which of these programs (or a different one) that would be best suited for an experiment without replicates? There doesn't seem to be any consensus in the literature, so I was hoping for any input.

Ultimately, I plan on calling differentially expressed genes by pooling the two CR, four PR, and six NR patients as "biological replicates" to determine genes that change within each group, then looking at the fold change of these genes within each individual patient. Does this sound like a reasonable approach?

I've been wondering if there is a reasonable way to analyze each of these patients individually, then find which genes are consistently differentially expressed. I'm hesitant to put any faith into the reported p-values from DE programs, as there are no replicates. Would it be reasonable to use expression (minimum FPKM cutoff) and log2-fold change to call "putative differentially expressed genes" in each patient, then examine the overlap? Or am I opening a can of worms with this line of thinking?

Thank you very much for the help, this is a wonderful community!

ADD COMMENTlink modified 2.9 years ago by i.sudbery10k • written 2.9 years ago by shawn.w.foley1.2k

I have essentially the same characteristics in the dataset which I'm currently analysing. Using the different patients with the same clinical outcome as replicates doesn't work because there's a lot of heterogeneity between them, so no genes are found to be significantly differentially expressed. That kind of analysis only really works for small experiments using cell lines. Using a method such as GFold, also recommended by the advice linked to in the other comment, is a feasible approach to get some rankings for each pair of samples belonging to a patient. Once you show those to a biologist, it'll be apparent that different patients have different mechanisms of resistance, demonstrating why treating the different patients as biological replicates is not viable.

ADD REPLYlink written 2.9 years ago by dario.garvan480
gravatar for i.sudbery
2.9 years ago by
Sheffield, UK
i.sudbery10k wrote:

Do not analyse the patients separately. In that design you have not replicates. And even if you could analyse them without replicates, looking for overlaps is a terrible way of finding which effects are significant - it relies on the arbitrary thresholds you have choosen to use to call significance having some none-arbitrary meaning.

But if you analyse them together you do have biological (but not technical) replicates. There is absolutely nothing wrong with this design. Its not a trick and or fudge, its the correct design for the experiment. See my answer to C: Replicates for RNA-seq from 1 cell line undergoing different treatments for more discussion of what is a biological and what a technical replicate.

As mentioned by Friederike, analysis of an experiment with almost exactly this design is explained in section 3.5 of the edgeR user manual.

Definitely use edgeR, Deseq2 or limma-voom to do the analysis (they all use approximately the same algorithm). I generally prepare counts using salmon and then tximport to import the data to R. Don't use cuffdiff - it cannot do these kinds of complex experimental designs.

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by i.sudbery10k

Thank you for the help, this input has been great!

DESeq2 will also allow me to calculate the expression fold changes for each individual patient pre- and post-treatment. I want to confirm that I should not put any weight into the reported p-values/FDRs for the individual patients. Instead I should take the lists of significantly changed genes using the pooled CR, PR, or NR patients, then simply use the fold changes for the individual patients to see how these genes change across all of our individuals. Does this sound reasonable?

ADD REPLYlink written 2.9 years ago by shawn.w.foley1.2k

Yes. The fold changes for each patient will still be indicative of what is happening for that patient, but its difficult to know how accurate they are.

ADD REPLYlink written 2.9 years ago by i.sudbery10k
gravatar for Friederike
2.9 years ago by
United States
Friederike6.7k wrote:

The approach you describe seems very reminiscent of the paired experimental design as described, for example, in the edgeR user manual (section 3). I strongly recommend you try that approach (or use DESeq2 with the appropriate design formula).

ADD COMMENTlink written 2.9 years ago by Friederike6.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1539 users visited in the last hour