Question: DESeq2-Differential expression Number limitation
0
gravatar for rhasanvandj
12 weeks ago by
rhasanvandj20
rhasanvandj20 wrote:

Hi Friends, How many genes can be used for differential expression analysis with DESeq2? I have 60000 genes but it did not give me result for all genes. What should I do?

rna-seq • 178 views
ADD COMMENTlink modified 12 weeks ago by ATpoint41k • written 12 weeks ago by rhasanvandj20

What result are you expecting? DESeq2 will give you a list of differentially expressed genes and statistics on them.

ADD REPLYlink written 12 weeks ago by RamRS30k

No DESeq2 could not work for 60 thousands gene. for some results are as NA. also other for other genes no significant p-adjust.

ADD REPLYlink written 12 weeks ago by rhasanvandj20

To follow up on RamRS's comment, I think it would be more helpful if you provided more information regarding your DESeq2 run (i.e. samples, groups, design formula). You can try to check this section of the DESeq2 vignette regarding the NA values.

ADD REPLYlink written 12 weeks ago by newbio17260

I used this code and none of the genes were significant differential expression. So, I think this cannot calculate differential expression for larg number of genes. Is there any suggestion?

rdata <- read.table("myData.txt", header = TRUE, row.names = 1)

library(DESeq2)

## Differential abundance
alpha <- 0.05 #set the cutoff value

## Create metadata - got this info from the first line of the raw data file
sample_org <- data.frame(row.names = colnames(rdata), c(rep("0", 22), rep("1", 22)))
colnames(sample_org) <- c("Group")

dds <- DESeqDataSetFromMatrix(countData = rdata,
                              colData = sample_org,
                              design = ~Group)

dd <- DESeq(dds)
res <- results(dd)

write.csv(res,"res.csv")
ADD REPLYlink modified 12 weeks ago by genomax92k • written 12 weeks ago by rhasanvandj20

Some quick points:

  • you are not doing lfc shrinkage - see the Quick start and Log fold change shrinkage for visualization and ranking
  • please read the section on NA p-values, HERE
  • You create a variable, alpha, but then never use it anywhere. If unsure, leave values in DESeq2 functions at their default
  • please perform some pre-filtering on your raw counts for low-expressed genes (although this is not necessary, as these are the very genes that will be more likely to have NA p-values)
ADD REPLYlink written 12 weeks ago by Kevin Blighe67k
1
gravatar for ATpoint
12 weeks ago by
ATpoint41k
Germany
ATpoint41k wrote:

It works on an arbitrary number of genes. I used it for setups with > 100.000 regions before. Please read the DESeq2 manual towards why NAs appear in the results. Having non-significant p-adjust is expected, the results object will not only contain significant but all genes.

ADD COMMENTlink written 12 weeks ago by ATpoint41k

Thanks Yes, non sig genes are expected but not all genes. in my case all genes showed padjust of 0.9 and off course it is wrong. Any solution?

ADD REPLYlink written 12 weeks ago by rhasanvandj20
2

Why is this wrong? You can either have no DEGs because there are none (in the biological reality) between conditions or your study is underpowered or variation between replicates is too large, so then this is a totally valid results.

ADD REPLYlink written 12 weeks ago by ATpoint41k

I would not say this is wrong, but perhaps a little bit worrying about the data quality. The study does not see underpowered in terms of replicates (22), but perhaps the read counts are very low ? You can easily check that using a MAplot. Or perhaps there is a huge variability between replicates ? Check the raw and DESeq2-normalized data (counts(dds, normalized=T)) for a few highly expressed genes and see if that makes sense. You could also do a principal component analysis and assess whether the replicates cluster together and what percentage of the total variability is associated with that.

ADD REPLYlink written 12 weeks ago by Carlo Yague5.2k

It would indeed be good to get some more details. If this is a cell line experiment n=22 is awesome, if this is e.g. a patient cohort investigating gender-specific drug response n=22 it is probably not enough.

ADD REPLYlink written 12 weeks ago by ATpoint41k

Hi Thank you they are patients. 22 patients vs 22 control. the data is HT-Seq read counts

ADD REPLYlink written 12 weeks ago by rhasanvandj20

Hi I added counts(dds, normalized=T after running in R studio it showed error as:

Error in .local(object, ...) : first calculate size factors, add normalizationFactors, or set normalized=FALSE

ADD REPLYlink written 12 weeks ago by rhasanvandj20

There are multiple fix to this, but in your case, the easiest would be to call the count() function after the DESeq() fonction. So counts(dd, normalized=T) should work.

ADD REPLYlink modified 12 weeks ago • written 12 weeks ago by Carlo Yague5.2k

I did but I get this:

Error in .local(object, ...) : first calculate size factors, add normalizationFactors, or set normalized=FALSE

ADD REPLYlink written 12 weeks ago by rhasanvandj20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1211 users visited in the last hour