Dear all, I am trying to run DESeq2 to get the list of differential expressed miRNAs between 330 tumor and 42 normal samples. However when I run the following code:
> dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Condition) > dds <- estimateSizeFactors(dds) > nc <- counts(dds, normalized=TRUE) > filter <- rowSums(nc >= 10) >= 2 > dds <- dds[filter,] > dds$condition<-relevel(dds$Condition, ref="normal") > design(dds) <- formula(~ Type + Condition) > dds <- DESeq(dds) using pre-existing size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing 32 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest -- replacing outliers and refitting for 124 genes -- DESeq argument 'minReplicatesForReplace' = 7 -- original counts are preserved in counts(dds) estimating dispersions fitting model and testing 29 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest.
I tried to fix this problem with using higher threshold in this code:
dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Condition) > keep <- rowSums((counts(dds)) >= 10) >=300 > dds <- dds[keep,] > dds$Condition<-relevel(dds$Condition, ref="normal") > design(dds) <- formula(~ Type + Condition) > dds <- estimateSizeFactors(dds) > dds <- estimateDispersions(dds) gene-wise dispersion estimates mean-dispersion relationship -- note: fitType='parametric', but the dispersion trend was not well captured by the function: y = a/x + b, and a local regression fit was automatically substituted. specify fitType='local' or 'mean' to avoid this message next time. final dispersion estimates > dds <- nbinomWaldTest(dds, maxit=1000) > res <- results(dds) > write.table(res,"D:\\FromD\\EndocrinologyAndMetabolicDisorderIns\\GDC\\deseq2-21khordadWhiteUsingFilter10-300.txt", sep="\t") > plotMA(res, ylim = c(-2,2)) > res <- results(dds, name="condition_Cancer_vs_normal")
However this time I get this error for the last line:
Error: subscript contains invalid names.
Now my question is what is the best threshold that we have to choose when we are working on large number of samples.
I did not face with this kind of errors with smaller number of samples (55 samples).
I would appreciate any comment