Question

DESeq2 for Differential expression Ensembl gene ID

0

Entering edit mode

4.8 years ago

Rob ▴ 170

Hi friends I use DESeq2 for differential expression analysis for HTSeq read count data with about 20000 genes. my genes have ENSEMBL ID. I have 44 patients (2 groups each with 22 patients) The result shows p-adjust all 0.9. Why is that? How can I fix it?

The code I use is this:

## Reading in raw data 
rdata <- read.table("my_data.txt", header = TRUE, row.names = 1)

library(DESeq2)

## Create metadata 
sample_org <- data.frame(row.names = colnames(rdata), c(rep("0", 22), rep("1", 22)))
colnames(sample_org) <- c("Group")

dds <- DESeqDataSetFromMatrix(countData = rdata,
                              colData = sample_org,
                             design = ~Group)

dd <- DESeq(dds)
res <- results(dd)
write.csv(res,"res.csv")

Also in this line of code I get this warning message from Rstudio but I do not think it affects my result:

dds <- DESeqDataSetFromMatrix(countData = rdata,
                              colData = sample_org,
                             design = ~Group)


Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors

RNA-Seq • 1.1k views

ADD COMMENT • link updated 4.8 years ago by Michael 55k • written 4.8 years ago by Rob ▴ 170

score 0 · Answer 1 · 2020-08-31

0

Entering edit mode

4.8 years ago

Michael 55k

Could you try the following shorthand code:

# generate some dummy data to check if the setup is sane 
dds.example <- makeExampleDESeqDataSet(n=nrow(rdata), m=44, betaSD = 1)


## Create metadata 
groups <- factor(c(rep("0", 22), rep("1", 22)))

dds <- DESeqDataSetFromMatrix(countData = rdata,
                              colData = DataFrame(groups), design = ~ groups)
dds.example <- DESeqDataSetFromMatrix(countData = counts(dds.example),
                              colData = DataFrame(groups), design = ~ groups)


dds <- DESeq(dds)
dds.example <- DESeq(dds.example)

res <- results(dds, contrast=c("groups", "0","1"))
res.example <- results(dds.example, contrast=c("groups", "0","1"))

print (paste ( "Found ", sum(res$padj < 0.05), "significant in my experimental data" ))
print (paste ( "Found ", sum(res.example$padj < 0.05), "significant in the synthetic data at dispersion = 1" ))

If there are still no significant genes, then you might have to try something else. There are cases where there is no significant effect.

ADD COMMENT • link 4.8 years ago by Michael 55k

0

Entering edit mode

Thanks Michael In your code I got no sig genes for the res but for the res.example I got around 7000 genes sig FDR<0.05. I did not understand why? Can I use this result or this is just example and not real result for my data? I did not understand why I have no sig genes in res.

ADD REPLY • link 4.8 years ago by Rob ▴ 170

0

Entering edit mode

res.example is synthetic simulated data, you should get DEG's with the settings. I have just introduced this as a quick test that the setup is ok because I don't have your data. If you don't get any gene with FDR < 0.05 that could mean there aren't any, at least none DESeq2 can detect. Certainly, this is disappointing because of the large number of samples. Most current methods are still being developed towards small sample sizes.

Reasons could be that the overall biological variability is too high, the effect size is extremely small (like comparing an ineffective drug treated group with a placebo group), up- vs. down-regulation is imbalanced or the groups are not properly set up. You could also try different methods such as edgeR.rb. Have a look at review papers like this one: https://doi.org/10.1371/journal.pone.0232271 and try to select a method with higher power on large sample size.

Still, the most likely explanation remains that you got no DEG's because there simply aren't any.

ADD REPLY • link 4.8 years ago by Michael 55k