Question

error in getting results from DESeq2

0

Entering edit mode

7.1 years ago

nazaninhoseinkhan ▴ 530

Dear all, I am trying to run DESeq2 to get the list of differential expressed miRNAs between 330 tumor and 42 normal samples. However when I run the following code:

> dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Condition)

> dds <- estimateSizeFactors(dds)

> nc <- counts(dds, normalized=TRUE)

> filter <- rowSums(nc >= 10) >= 2

> dds <- dds[filter,]

> dds$condition<-relevel(dds$Condition, ref="normal")

> design(dds) <- formula(~ Type + Condition)

> dds <- DESeq(dds)

using pre-existing size factors

estimating dispersions

gene-wise dispersion estimates

mean-dispersion relationship

final dispersion estimates

fitting model and testing

32 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

-- replacing outliers and refitting for 124 genes

-- DESeq argument 'minReplicatesForReplace' = 7 

-- original counts are preserved in counts(dds)

estimating dispersions

fitting model and testing

29 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest.

I tried to fix this problem with using higher threshold in this code:

dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Condition)

> keep <- rowSums((counts(dds)) >= 10) >=300 

> dds <- dds[keep,]

> dds$Condition<-relevel(dds$Condition, ref="normal")

> design(dds) <- formula(~ Type + Condition)

> dds <- estimateSizeFactors(dds)

> dds <- estimateDispersions(dds)

gene-wise dispersion estimates

mean-dispersion relationship

-- note: fitType='parametric', but the dispersion trend was not well captured by the
   function: y = a/x + b, and a local regression fit was automatically substituted.
   specify fitType='local' or 'mean' to avoid this message next time.

final dispersion estimates

> dds <- nbinomWaldTest(dds, maxit=1000)

> res <- results(dds)

> write.table(res,"D:\\FromD\\EndocrinologyAndMetabolicDisorderIns\\GDC\\deseq2-21khordadWhiteUsingFilter10-300.txt", sep="\t")

> plotMA(res, ylim = c(-2,2))

> res <- results(dds, name="condition_Cancer_vs_normal")

However this time I get this error for the last line: Error: subscript contains invalid names.

Now my question is what is the best threshold that we have to choose when we are working on large number of samples.

I did not face with this kind of errors with smaller number of samples (55 samples).

I would appreciate any comment

Nazanin

DESeq2 results error • 15k views

ADD COMMENT • link updated 7.1 years ago by Devon Ryan 105k • written 7.1 years ago by nazaninhoseinkhan ▴ 530

0

Entering edit mode

You get the error immediately upon running the last line or elsewhere? What threshold are you asking about in your post?

ADD REPLY • link 7.1 years ago by Devon Ryan 105k

0

Entering edit mode

Yes, I got the error when I use

res <- results(dds, name="condition_Cancer_vs_normal")

I want to know what thresholds we are allowed to use in :

keep <- rowSums((counts(dds)) >= 10) >=300

and in:

dds <- nbinomWaldTest(dds, maxit=1000)

ADD REPLY • link updated 6.0 years ago by Ram 45k • written 7.1 years ago by nazaninhoseinkhan ▴ 530

0

Entering edit mode

Any particular reason for choosing 10 and 300 in

> rowSums((counts(dds)) >= 10) >=300

I have seen several papers/tutorials using different strategy like rowSums>0 to filter low count genes, but I could never get the real reason.

ADD REPLY • link updated 6.0 years ago by Ram 45k • written 7.0 years ago by Arindam Ghosh ▴ 550

score 0 · Answer 1 · 2018-06-12

0

Entering edit mode

7.1 years ago

Devon Ryan 105k

You can skip using keep entirely, that's unlikely to help you in any way. The maxit value should be high enough that you get convergence, so if you didn't get an error with that then it's fine. You're receiving Error: subscript contains invalid names. because you mistyped one of the levels. My guess is that you should be using name="condition_cancer_vs_normal" (note the lowercase C).

ADD COMMENT • link 7.1 years ago by Devon Ryan 105k

0

Entering edit mode

Hi,

Thank you so much for your reply.

Yes, this morning I realized that I misspelled the "Condition".

And thanks for your advice about maxit. I think it is about the number of iterations and does not affect my data.

ADD REPLY • link 7.1 years ago by nazaninhoseinkhan ▴ 530