DESEQ for Differential Expression
1
0
Entering edit mode
2.2 years ago
ryme ▴ 30

Hello guys,

I am a PhD student and I just started using R and DESEQ for RNA data analysis. I am new to R and I just want to make sure that I am using DESEQ in the correct way. I have read the vignette of DESEQ, followed many tutorials but I feel I still want to double check my script with experienced people in this field. Many thanks in advance for your help.

I have 3 cell lines and 3 treatments for each cell line, so 12 samples to analyze (I am using the treatment as variable).

My code:

Reading the data

counts = read.table("counts.txt")
coldata = read.table("meta.txt")

To make sure I have the correct ordering

all(colnames(counts) %in% rownames(coldata))
all(colnames(counts) == rownames(coldata))

DESEQ

 dds = DESeqDataSetFromMatrix(countData = counts,
                                 colData = coldata,
                                 design = ~Treatment)


 dds$Treatment = relevel(dds$Treatment, ref = "siSCR") #setting the reference 

 dds = DESeq(dds)

 result = results(dds, contrast = c("Treatment", "siACLY", "siSCR")) #Comapre siACLY to siSCR

Filtering from NA and p adj 0.05

filter1 = result[complete.cases(result),]

filter2 = filter1[filter1$padj < 0.05,]
  • My first question is: is it better to split the samples at each comparison and create a new dds object for each comparison or it is better to use all the samples in the count table and then specify the comparison with the contrast function?

In the above code I did not split the samples.

  • When I did not split the samples I got 2,542 DEGs (p adj<0.05)
  • When I split the samples I got 1,242 DEGs (p adj <0.05) (for the same analysis)

By splitting the samples I mean keeping only the samples I want to compare in the count table and removing the others. So the dispersion and the normalization will be calculated based on the kept samples only.

  • My second question is: If I do not split the samples, could I relevel the dds object before every comparison or it is not necessary since I am specifying the treatments to compare in the contrast function?

  • My third question is, is there any mistake in my code?

Any help would be appreciated! I do really need experienced people's help. Thank you.

expression DESEQ Rstudio differential DESEQ2 • 647 views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode
2.2 years ago
LauferVA 4.2k

My first question is: is it better to split the samples at each comparison and create a new dds object for each comparison or it is better to use all the samples in the count table and then specify the comparison with the contrast function?

The answer to this question is that it depends. I hope that the following is not too convoluted.

There are 3 classical methods by which a statistical hypothesis test can be carried out (Wald statistic, score test, or LRT). Although similar logic can be used for any of these, let's consider a Wald statistic.

Because the Wald statistic has the form: Wald statstic

We can see it is:

  1. proportional to the mean difference between the parameter estimate and the mean of the data.
  2. inversely proportional to the variance of your data.

Let's start with just the samples in the contrast you describe. You will have a certain mean and a certain variance. Now, let's keep adding other samples. If the inclusion of these additional samples decreases the variance in the data set, the W statistic will be larger and the p-value will be more extreme. If inclusion of the additional samples increases the difference between the mean and the paramater value, same deal.

So, all you need to realize is this: the inclusion of additional samples into the analysis will increase statistical power if the ratio of the mean difference to the variance of the mean increases.

It follows that it is possible to write a closed-form mathematical proof that there are cases in which one or the other approach would have better performance.

Perhaps it will be more intuitive to think of this as altering total variability in relation to within and between group variation...

ADD COMMENT

Login before adding your answer.

Traffic: 1255 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6