Differential gene expression analysis- different methods
Entering edit mode
9.2 years ago
jackuser1979 ▴ 890

One of my colleague outsourced the data analysis and they followed below method in finding differential expression analysis

First they did, denovo assembly of controlA, controlB, treatedA and treated B separtely. Assembled transcripts from all the samples were iteratively clustered to produce uni-transcripts with fewest redudant sequence (they did not mention how they clustered to produce uni-transcripts). Then pre-processed reads were then mapped to the uni-transcripts to carry out the expression analysis and differential expression analysis using tophat-cufflinks.

they followed -clustering parameters

  • minimum identity for overlaps: 96%
  • minimum overlap length: 50bp
  • maximum length of unmatched overhangs: 50bp

they followed -uni-transcipt is considered differentially expressed if fold change > or equal to 4 and q-value less than 0.05

expression detected only in one sample condition and qvalue less than 0.05

From my experience, I usually do denovo assemble all the samples (control A, Control B, treatedA and treated B) as one reference transcript assembly. Then I do read map each sample (controlA, controlB and treated A and treated B) and get for each sampe FPKM values. Then do differential expression analysis taking all read mapped ones in any one of differential expression software (edgeR, deseq).

Do you think my colleague outsourced method is correct? If it is correct, why they want to do clustering?

denovo-assembly differential-gene-expression • 2.8k views
Entering edit mode

On a small side note, when you only have 2 samples in each group, I would be very vary of any findings you make. The statistical power of such a sample size is simply too low to make any decent conclusions.

Entering edit mode
9.2 years ago

First, you probably should mention "de novo assembly" in the question and tags in order to get more responses.

This is probably not how I could do this type of analysis, but this is a common question. I've collected my own suggestions in this post.

My recommendation would probably be to use a strategy that involves defining a single pooled reference, which you can then use for a normal differential expression analysis. However, I noticed that a paper describing the Corset algorithm was recently published, which might be more similar to the strategy that you are describing. You can take a look at that paper to see how it compared to your own "clustering" strategy:


Login before adding your answer.

Traffic: 1034 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6