DE analysis with two samples in multiple samples
1
0
Entering edit mode
6.7 years ago
Satyajeet Khare ★ 1.6k

I use EdgeR to perform DE analysis using the standard protocol. The steps are as follows.

  1. Alignment using HiSAT2,
  2. Count matrix generation using PrepDE.py,
  3. DE analysis using EdgeR using LRT.

When I perform DE analysis with count matrix for only two samples groups which I need to compare, I get larger number of deferentially expressed genes, as compared to, when I perform DE analysis with count matrix for large number of samples, and compare the same two samples groups using contrast parameter.

I am assuming that presence of counts from other group samples affects normalisation and dispersion of counts of samples from these two groups which are of my interest.

My question is, which DE genes should I trust? The ones I get when I use only two-sample-group-count-matrix or the one I get when I use an all-sample-group-count-matrix?

EdgeR RNA-Seq • 1.6k views
ADD COMMENT
0
Entering edit mode
6.7 years ago

You should get essentially no DE genes with just two samples, that you're not is a problem with edgeR (or your usage of it). Never trust unreplicated experiments. You should only follow up on the results with multiple samples per group.

Even contrasts where the two "groups" being compared are comprised of single samples are largely meaningless.

ADD COMMENT
0
Entering edit mode

Hi Devon,

Please let me correct myself. By two samples I meant two groups.

In other words, if I perform DE analysis between C1 and T1 groups using C1 and T1 count matrix, I get more number of DE genes. If I perform DE analysis between C1 and T1 groups using C1, T1, C2, T2, C3, T3 count matrix, I get less number of DE genes.

ADD REPLY
1
Entering edit mode

That's better then. Your ability to properly assess variance increases with sample number, so in general the design with more groups will be more reliable. My presumption is that the two group case isn't having extreme variance cases penalized as much.

ADD REPLY
0
Entering edit mode

What if the library prep method for C1, T1, C2, T2 is different from C3, T3, C4, T4? Both are PolyA based kits but from different manufacturers.

Would that lead to variability due to technical reasons? In that case should I make separate count matrix for first four samples and separate count matrix for last four samples?

ADD REPLY
1
Entering edit mode

Yes, in that case you're probably better off splitting things by prep kit.

ADD REPLY

Login before adding your answer.

Traffic: 3082 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6