Question

Differential expression without technical replicates

3

Entering edit mode

10.1 years ago

wstfljs ▴ 100

Hello everyone! I've got a question about a sense of conducting a differential expression analysis without having technical replicates. What I have now, is RNA-seq data from four developmental stages of a parasite (1 biological replicate per sample, no technical replicates). The genome size is ~40 Mb and for each sample there is between 100 to 170 million paired-end reads, so sequencing depth is really big. I have used both DESeq and Cufflinks to estimate expression on a gene level. In the case of having no replicates, should I just focus on fold change between the samples? I'm not sure if the calculated p-values in both DESeq and Cufflinks are meaningful in case of no replicates.. Thanks for any suggestions!

cufflinks RNA-Seq DESeq • 5.6k views

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 10.1 years ago by wstfljs ▴ 100

Ram · Answer 1 · 2015-06-03

4

Entering edit mode

10.1 years ago

Devon Ryan 105k

Technical replicates are largely useless in RNAseq, biological replicates, in contrast, are vital. If you have only a single sample per condition (it's somewhat unclear if your "1 biological replicate" means 1 or 2 samples per condition/timepoint) then the yes, you're largely stuck looking at fold-changes. You might try GFold, though I don't know if it allows time-course designs. Regardless of the tool, be highly suspect of the results, they'll only be vaguely useful.

ADD COMMENT • link updated 2.5 years ago by Ram 45k • written 10.1 years ago by Devon Ryan 105k

0

Entering edit mode

Sorry, what I have is all together 4 RNA-seq datasets - 1 dataset per developmental stage of the parasite. Additionally, couple of genes were tested with RT-PCR by my colleague, so I can always use it as sort of benchmark to see if the observed fold changes from RNA-seq are being reflected in RT-PCR. Also, there is another very closely related species, for which gene expression was estimated in two (out of four) developmental stages, but with several biological and technical replicates this time. I'm just assuming that these results could also serve for checking if the observed expression patterns from my analyses are correct.

ADD REPLY • link updated 2.5 years ago by Ram 45k • written 10.1 years ago by wstfljs ▴ 100

0

Entering edit mode

That seems reasonable.

ADD REPLY • link 10.1 years ago by Devon Ryan 105k

Ram · Answer 2 · 2015-06-03

For a time-series RNA-Seq data without replicates, STEM (Short Time-series Expression Miner) is a pretty well option. STEM cannot identify significantly expressed genes. It groups genes based on their expression patterns along the time-course, and tells us whether a group of genes is significantly enriched for particular GO terms. Of course, this software can provide more promising results with biological replicates.

STEM was developed by Jason Ernst for microarray analysis in the beginning. This tool can analyze our RNA-Seq data very well, too. Now, Jason is an assistant professor in UCLA, and friendly to respond questions about STEM based on my limited experience.

Short Time-series Expression Miner (STEM)

Jason Ernst Lab

Ram · Answer 3 · 2015-06-03

Without any replicate, you can't really estimate the dispersion factor. However, you mention in your comments that your colleague has done some testing before, maybe you can estimate a dispersion from it? Note that the main point to have replicates is to estimate dispersion, if you have some kind of data to estimate it, you might get away without any replicate. Please take a look at the EdgeR user manual.

If you don't have the information, I'm afraid you should just look at log fold-change.