How to get differential gene expression within paired samples using EdgeR or DESeq2?
1
1
Entering edit mode
2.1 years ago

Hi Folks,

I am currently working on 16 bulk RNAseq samples. I did some basic research on statistical testing of differential expression and figured out two R packages to edgeR and DESeq2.

The 16 samples are actually paired samples data from before treatment and after treatment from 8 patients. I am interested in finding DE genes within paired samples for instance (P1_AT vs P1_BT). This is the design matrix

Sample   Patient    Treatment
P1_BT       P1       BT
P1_AT       P1       AT
P2_BT       P2       BT
P2_AT       P2       AT
........
P8_BT       P8       BT
P8_AT       P8       AT


So I have created my design matrix like section 4.1 - design <- model.matrix(~ Patient+Treatment)

EdgeR manual, my experiment design is exactly similar to section 4.1 of edgeR manual.

4.1 RNA-Seq of oral carcinomas vs matched normaltissue

Patient <- factor(c(8,8,33,33,51,51))
Tissue <- factor(c("N","T","N","T","N","T"))
data.frame(Sample=colnames(y),Patient,Tissue)
Sample Patient Tissue
8N       8      N
8T       8      T
33N      33      N
33T      33      T
51N      51      N
51T      51      T
design <- model.matrix(~Patient+Tissue)
rownames(design) <- colnames(y)
design
(Intercept) Patient33 Patient51 TissueT
8N            1         0         0       0
8T            1         0         0       1
33N           1         1         0     0
33T           1         1         0       1
51N           1         0         1       0
51T           1         0         1       1
fit <- glmFit(y, design)
lrt <- glmLRT(fit)
topTags(lrt)
Coefficient:  TissueT
RefSeqID   Symbol NbrOfExons logFC logCPM   LR   PValue      FDR
5737   NM_001039585    PTGFR          4 -5.18   4.74 98.7 2.97e-23 3.12e-19
5744      NM_002820    PTHLH          4  3.97   6.21 92.2 8.00e-22 4.21e-18
3479   NM_001111283     IGF1          5 -3.99   5.71 86.5 1.38e-20 4.85e-17
colnames(design)[1] "(Intercept)" "Patient33"   "Patient51"   "TissueT"


But I am interested in getting 1) P1_AT vs P1_BT, 2) P2_AT vs P2_BT ... 8) P8_AT vs P8_BT

To find genes with baseline differences between the drug and the placebo at 0 hours:
qlf <- glmQLFTest(fit, contrast=my.contrasts[,"DrugvsPlacebo.0h"])

my.contrasts <- makeContrasts(
+         Drug.1vs0 = Drug.1h-Drug.0h,
+         Drug.2vs0 = Drug.2h-Drug.0h,
+         Placebo.1vs0 = Placebo.1h-Placebo.0h,
+         Placebo.2vs0 = Placebo.2h-Placebo.0h,
+         DrugvsPlacebo.0h = Drug.0h-Placebo.0h,
+         DrugvsPlacebo.1h = (Drug.1h-Drug.0h)-(Placebo.1h-Placebo.0h),
+         DrugvsPlacebo.2h = (Drug.2h-Drug.0h)-(Placebo.2h-Placebo.0h),
+ levels=design)


3.3.1 Defining each treatment combination as a group

The design matrix is different for section 4.1 and section 3.3.1. My experiment similar to section 4.1, so I have used that design matrix. But I want the results of section 3.3.1 within samples comparison.

RNA-Seq EdgeR DESeq2 Paired samples designmatrix • 2.2k views
1
Entering edit mode
2.1 years ago

I found it somewhat difficult to follow your question due to the blurred screenshots, and the fact that all of the parameters that you list in your design formula are not in the metadata that you show, e.g., Treatment. In fact, the different parts of your question seem somewhat disparate to each other... (?)

For all intents and purposes, if you want to do comparisons like P1_AT vs P1_BT, then simply include Sample in your design formula. However, in this case, it looks like you only have 1 replicate per group?

0
Entering edit mode

Thanks, Kevin for helping me out. I have made some changes to the original post by replacing the screenshots and changing in the design matrix from "Time" to "Treatment".

I want to compare and find DE genes within paired samples like "P1_AT vs P1_BT", "P2_AT vs P2_BT" and so on?

In the above post, two sections from edgeR (section 3.3.1-Multifactor analysis and section 4.1-Paired).

From section4.1, I need some help in understanding the output. f

it <- glmFit(y, design)
lrt <- glmLRT(fit)
topTags(lrt)
Coefficient:  TissueT
RefSeqID   Symbol NbrOfExons logFC logCPM   LR   PValue      FDR
5737   NM_001039585    PTGFR          4 -5.18   4.74 98.7 2.97e-23 3.12e-19
5744      NM_002820    PTHLH          4  3.97   6.21 92.2 8.00e-22 4.21e-1


colnames(design)[1] "(Intercept)" "Patient33" "Patient51" "TissueT"

This is my understanding, TissueT refers to comparison of all tumor samples (8T, 33T, 51T) vs all normal samples (8N, 33N, 51N). Am I correct?

What about Patient33, does it mean Patient33 (N and T) vs Patient8 (N and T)?

What about Patient51, does it mean Patient51 (N and T) vs Patient8 (N and T)?

In order to compare, Patient51 N vs Patient51 T, what design and contrast should be selected?

1
Entering edit mode

This is my understanding, TissueT refers to comparison of all tumor samples (8T, 33T, 51T) vs all normal samples (8N, 33N, 51N). Am I correct?

Yes, this is correct. However, to add, due to the fact that Patient is also included in the design formula (~ Patient + Tissue), the results that you derive for TissueT are controlled for the Tumor-Normal pairing, i.e., it is already a paired analysis.

What about Patient33, does it mean Patient33 (N and T) vs Patient8 (N and T)? What about Patient51, does it mean Patient51 (N and T) vs Patient8 (N and T)?

No, these do not have much useful meaning, in this situation. 'Patient33' would be: Patient33 (T+N) vs [Patient51 (T+N) & Patient8 (T+N)]

In order to compare, Patient51 N vs Patient51 T, what design and contrast should be selected?

You should not derive p-values from a 1 versus 1 comparison. It would be better to use TissueT, which will give you stats for Tumour versus Normal, while controlling for patient-specific effects.

0
Entering edit mode

Hi Kevin

I have a very similar situation and also interested in getting the below kind of comparison for my dataset.

In order to compare, Patient51 N vs Patient51 T, what design and contrast should be selected? You are correct that the p-value comes from a 1 versus 1 comparison should not be considered.

Took the content from edgeR manual,

I got the following colnames for this design_2 = model.matrix(~Subject+Treat)
- (Intercept)
- Subject2
- Subject3
- TreatT

1) qlf <- glmQLFTest(fit),
Coefficient : TreatT. Compare all 3 controls vs all 3 treatments
2) qlf <- glmQLFTest(fit, coef=2),
Coefficient:  Subject2, compare Subject 2(control + treatment) vs Subject 1(control + treatment)
3) qlf <- glmQLFTest(fit, coef=3),
Coefficient:  Subject3, compare Subject 3(control + treatment) vs Subject 1(control + treatment)


I want to compare Subject 1_C vs Subject 1_T, Subject 2_C vs Subject 2_T and Subject 3_C vs Subject 3_T. Therefore I created the new design.

setting dispersion to NA (bcos there are no replicates) but in the attached screenshot from section 3.4.1 (Paired Samples).

Both the control and the treatment are administered to each of three patients. They are comparing the treatment to the control for each patient separately.

How to compare them separately?

• Subject 1_C vs Subject 1_T
• Subject 2_C vs Subject 2_T
• Subject 3_C vs Subject 3_T

What mistake I have done?