Question: dispersion is NA error message with edgeR
0
gravatar for Sharon
21 months ago by
Sharon440
Sharon440 wrote:

I am getting the following error for the code below and I do have 3 replicates. Any one come through this before?

Error in exactTest(dge, pair = c("WT", "Mut")) : dispersion is NA

labels=c("WT_1", "WT_2", "WT_3", "Mut_1", "Mut_2", "Mut_3")
data <- readDGE(files)
group <- c(rep("WT", 3), rep("Mut", 3))
dge = DGEList(counts=data, group=group)
idfound <- dge$genes$RefSeqID %in% mappedRkeys(org.Hs.egREFSEQ)
dge <- dge[idfound,]
dim(dge)
egREFSEQ <- toTable(org.Hs.egREFSEQ)
head(egREFSEQ)
m <- match(dge$genes$RefSeqID, egREFSEQ$accession)
dge$genes$EntrezGene <- egREFSEQ$gene_id[m]
egSYMBOL <- toTable(org.Hs.egSYMBOL)
head(egSYMBOL)
m <- match(dge$genes$EntrezGene, egSYMBOL$gene_id)
dge$genes$Symbol <- egSYMBOL$symbol[m]
head(dge$genes)
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge, pair=c("WT", "Mut"))
etp <- topTags(et)
etp$table$logFC = -etp$table$logFC
edger rna-seq • 815 views
ADD COMMENTlink modified 21 months ago by Kevin Blighe46k • written 21 months ago by Sharon440
0
gravatar for Kevin Blighe
21 months ago by
Kevin Blighe46k
Kevin Blighe46k wrote:

Hey Sharon,

It looks like you have left out your model design formula from the two dispersion functions, for example:

design <- model.matrix(~group)
...
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge, design)
dge <- estimateTagwiseDisp(dge, design)

Kevin

ADD COMMENTlink modified 21 months ago • written 21 months ago by Kevin Blighe46k

Hey Kevin Thank you. I think we can do that. It works without the design matrix. The dispersion error comes when I added the genes id addition part. If I removed this genes ID part, every thing works. like if I removed the section below every thing works correctly. But the genes ID are added correctly though. They just cause dispersion error.

idfound <- dge$genes$RefSeqID %in% mappedRkeys(org.Hs.egREFSEQ)
dge <- dge[idfound,]
dim(dge)
egREFSEQ <- toTable(org.Hs.egREFSEQ)
head(egREFSEQ)
m <- match(dge$genes$RefSeqID, egREFSEQ$accession)
dge$genes$EntrezGene <- egREFSEQ$gene_id[m]
egSYMBOL <- toTable(org.Hs.egSYMBOL)
head(egSYMBOL)
m <- match(dge$genes$EntrezGene, egSYMBOL$gene_id)
dge$genes$Symbol <- egSYMBOL$symbol[m]
head(dge$genes)
ADD REPLYlink modified 21 months ago • written 21 months ago by Sharon440

I see, Sharon. Yes, I actually just noticed your previous thread where you had originally posted this.

There must be some other 'names' attribute of the dge object that is causing the discrepancy.

What about something like names(dge) or rownames(dge), or even just output the result of str(dge) to see the structure of the object, which may help us to identify the issue.

ADD REPLYlink written 21 months ago by Kevin Blighe46k

Thanks Kevin. So, I think the confusion is I am starting from transcripts counts from Salmon and wanted to find the upregulated/downregulated genes in edgeR. So, the confusion comes from converting transcripts ID to genes ID, that introduces some nulls which causes dispersion.

ADD REPLYlink written 21 months ago by Sharon440

Okay, yes, your m vectors will likely contain many NAs for names that don't match. match() does return NAs like that, whereas which() only returns the indices of the matches.

You could either remove these non-matches (not desirable) or convert them back to their original values with something like:

ifelse( is.na(egREFSEQ$gene_id[m]), dge$genes$RefSeqID, egREFSEQ$gene_id[m] )

That should work, but test it well. It will keep anything that has been successfully converted to egREFSEQ$gene_id, but, for non-matches, it will restore the original value (dge$genes$RefSeqID)

ADD REPLYlink modified 21 months ago • written 21 months ago by Kevin Blighe46k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1825 users visited in the last hour