Question: dispersion is NA error message with edgeR
0
gravatar for Sharon
2.3 years ago by
Sharon460
Sharon460 wrote:

I am getting the following error for the code below and I do have 3 replicates. Any one come through this before?

Error in exactTest(dge, pair = c("WT", "Mut")) : dispersion is NA

labels=c("WT_1", "WT_2", "WT_3", "Mut_1", "Mut_2", "Mut_3")
data <- readDGE(files)
group <- c(rep("WT", 3), rep("Mut", 3))
dge = DGEList(counts=data, group=group)
idfound <- dge$genes$RefSeqID %in% mappedRkeys(org.Hs.egREFSEQ)
dge <- dge[idfound,]
dim(dge)
egREFSEQ <- toTable(org.Hs.egREFSEQ)
head(egREFSEQ)
m <- match(dge$genes$RefSeqID, egREFSEQ$accession)
dge$genes$EntrezGene <- egREFSEQ$gene_id[m]
egSYMBOL <- toTable(org.Hs.egSYMBOL)
head(egSYMBOL)
m <- match(dge$genes$EntrezGene, egSYMBOL$gene_id)
dge$genes$Symbol <- egSYMBOL$symbol[m]
head(dge$genes)
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge)
dge <- estimateTagwiseDisp(dge)
et <- exactTest(dge, pair=c("WT", "Mut"))
etp <- topTags(et)
etp$table$logFC = -etp$table$logFC
edger rna-seq • 1.0k views
ADD COMMENTlink modified 2.3 years ago by Kevin Blighe54k • written 2.3 years ago by Sharon460
0
gravatar for Kevin Blighe
2.3 years ago by
Kevin Blighe54k
Kevin Blighe54k wrote:

Hey Sharon,

It looks like you have left out your model design formula from the two dispersion functions, for example:

design <- model.matrix(~group)
...
dge <- calcNormFactors(dge)
dge <- estimateCommonDisp(dge, design)
dge <- estimateTagwiseDisp(dge, design)

Kevin

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by Kevin Blighe54k

Hey Kevin Thank you. I think we can do that. It works without the design matrix. The dispersion error comes when I added the genes id addition part. If I removed this genes ID part, every thing works. like if I removed the section below every thing works correctly. But the genes ID are added correctly though. They just cause dispersion error.

idfound <- dge$genes$RefSeqID %in% mappedRkeys(org.Hs.egREFSEQ)
dge <- dge[idfound,]
dim(dge)
egREFSEQ <- toTable(org.Hs.egREFSEQ)
head(egREFSEQ)
m <- match(dge$genes$RefSeqID, egREFSEQ$accession)
dge$genes$EntrezGene <- egREFSEQ$gene_id[m]
egSYMBOL <- toTable(org.Hs.egSYMBOL)
head(egSYMBOL)
m <- match(dge$genes$EntrezGene, egSYMBOL$gene_id)
dge$genes$Symbol <- egSYMBOL$symbol[m]
head(dge$genes)
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Sharon460

I see, Sharon. Yes, I actually just noticed your previous thread where you had originally posted this.

There must be some other 'names' attribute of the dge object that is causing the discrepancy.

What about something like names(dge) or rownames(dge), or even just output the result of str(dge) to see the structure of the object, which may help us to identify the issue.

ADD REPLYlink written 2.3 years ago by Kevin Blighe54k

Thanks Kevin. So, I think the confusion is I am starting from transcripts counts from Salmon and wanted to find the upregulated/downregulated genes in edgeR. So, the confusion comes from converting transcripts ID to genes ID, that introduces some nulls which causes dispersion.

ADD REPLYlink written 2.3 years ago by Sharon460

Okay, yes, your m vectors will likely contain many NAs for names that don't match. match() does return NAs like that, whereas which() only returns the indices of the matches.

You could either remove these non-matches (not desirable) or convert them back to their original values with something like:

ifelse( is.na(egREFSEQ$gene_id[m]), dge$genes$RefSeqID, egREFSEQ$gene_id[m] )

That should work, but test it well. It will keep anything that has been successfully converted to egREFSEQ$gene_id, but, for non-matches, it will restore the original value (dge$genes$RefSeqID)

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Kevin Blighe54k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1145 users visited in the last hour