Question

Gene expression-comparing replicates

0

Entering edit mode

10.2 years ago

jackuser1979 ▴ 890

I have normal tissue (replicate A1 and replicate A2) and abnormal tissue (replicate B1 and replicate B2). I would like to do gene expression analysis. I don't have close reference genome. I would like to denovo assemble all samples (normal tissue and abnormal tissue transcripts) and do read mapping each one to this assembled transcriptome. I can get FPKM for each replicates (replicate A1, A2, B1 and B2). Then I think I can get significant of genes among all four samples using cufflinks?

My question is:
1.By cufflinks pipeline I can get only significant genes (up regulated and down regulated) present in normal tissue vs abnormal tissue irrespective of replicate?
2.Can we know which replicate best among normal tissue (which is best whether replicate A1 or A2) and abnormal tissue (which is best replicate B1 or B2).? i.e can we compare replicate A1 and A2 and get best replicate for normal tissue, similarily for replicate B1 and B2 and get best replicate in abnormal tissue?

RNA-Seq gene expression • 3.5k views

ADD COMMENT • link updated 10.0 years ago by Biostar 20 • written 10.2 years ago by jackuser1979 ▴ 890

Ram · Answer 1 · 2014-06-04

Without a reference genome, how (and why) would you use cufflinks? If you're going to do de novo assembly, you can simply do de novo assembly and then use bowtie or some other aligner to count reads mapping against each transcript. Given that you know the transcript length, you can then calculate FPKM for each transcript in each replicate. You could also use edgeR to generate normalized RPKM values and calculate statistics for differential expression. There's a section in the edgeR user's manual about what to do if you have no replicates - which may be informative about your desire to evaluate the replicates relative to each other. However, comparing replicate A to B to see which is "best" requires some definition of "best" that is probably relative to something outside your system. You may be able to try the MultiDimensionalScaling plot in edgeR to see how your replicates place relative to each other, do a PCA analysis, or simply examine correlation of all your data sets relative to each other. Do the replicates have higher correlation between each other than when compared to the other sample types? Comparing two replicates will allow you to see how they differ from each other, but without some kind of outside reference, or other criteria, you won't know what that difference means, i.e. which one is "right". Sort of like clapping your hands together and asking which one is loudest.

Ram · Answer 2 · 2014-06-04

In general, I would always recommend making use of replicates in order to capture biological variability (looks like you have 2 groups, each with duplicates; so, that is good).

Among the popular RNA-Seq analysis tools, I prefer DESeq (but I think limma is also pretty good).

As pointed out by seidel, I think the real difficulty is determining how to define a common set of genes for the differential expression analysis when working with de novo assembly data. This is actually a fairly common question, so I collected a set of suggestions in a blog post:

http://cdwscience.blogspot.com/2014/04/differential-expression-without.html

The most direct answer is that I would suggest creating one reference from all 4 samples, and then align each sample separately (unless you want to try a kmer based strategy like NIKS or RUFUS)