Row names and probe names does not match in topTable output
0
0
Entering edit mode
2.6 years ago
Christine • 0

Hello I am using limma to analyze differential methylation on a 850k Illumina array, and set up my model as recommended by the user guide. Today I noticed after running topTable() that the rownames in the result data frame and the "Name" column does not match. An extract from the output from topTable() looks like this:

             chr       pos strand       Name 
cg09840645  chr4 152613181      - cg02827491     
cg05161279 chr17  58156411      + cg13790571         
cg26508775  chr4 114213574      + cg02296478         
cg23550779 chr19  51471364      + cg08083242         
cg00391247  chr5 126112230      + cg17185133         
cg01799602  chr4 108104446      + cg18057564

I am worried that all my analyses so far has been based on the "wrong" values and genes for each probe. I am wondering which probe name is correct, and best: how to get them to match? I could not find this problem described elsewhere.

Sincerely, Christine

rownames topTable limma • 1.3k views
ADD COMMENT
0
Entering edit mode

I'm guessing that you have those extra columns (Name, chr, pos...) because you input probe annotation information somewhere in the pipeline? For example, it can be input within topTable() using the genelist argument. I think that your issue may happen when the order of the rows of the methylation matrix (the one you input to lmFit()) and of your array annotation matrix is different. Check that they have the same order. Nonetheless, it would be better to see the code you are using.

ADD REPLY
0
Entering edit mode

Yes, I forgot to mention that I am using the Illumina annotation in the R package IlluminaHumanMethylationEPICanno.ilm10b4.hg19 as genelist to lmFit(). Even though both the data and the gene list are sorted by probe name, the rownames and column "Name" does not match.

Here is the code:

library(limma)
library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)

annEPIC <- getAnnotation(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
annEPIC <- annEPIC[order(annEPIC$Name),]

m <- m[order(rownames(m)),]
individual <- factor(sampleinfo$individual)
status <- factor(sampleinfo$Status)

design <- model.matrix(~status, data=sampleinfo)

statusdc <- duplicateCorrelation(m, design=design, block= individual) #I have a linear mixed model design
fit <- lmFit(m, design, block = individual, correlation = statusdc$consensus)
fit <- eBayes(fit) 
summary(decideTests(fit))
statusDMPs <- topTable(fit, num=Inf, coef=2, genelist=annEPIC)
head(statusDMPs)
ADD REPLY
1
Entering edit mode

It may be possible that you do not have the same number of rows in annEPIC and m, because of filtering steps or else.

First, check if all the probes in m are contained in annEPIC using: table(rownames(m) %in% rownames(annEPIC)). If they are, you can set the same order for both by doing: annEPIC <- annEPIC[rownames(m),] (and check with table(rownames(annEPIC) == rownames(m)).

There may be a possibility that they are not. If they are not, you will have to filter both, by doing something similar to:

annEPIC <- annEPIC[intersect(rownames(m),rownames(annEPIC)),]
m <- m[intersect(rownames(m),rownames(annEPIC)),]

(and check with table(rownames(annEPIC) == rownames(m)).

ADD REPLY
0
Entering edit mode

This was the solution, thank you so much!!

ADD REPLY

Login before adding your answer.

Traffic: 1764 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6