Hello,
I am running WGCNA analysis on public RNA-seq expression data. Some of the gene IDs are in Entrez, others in Ensembl. I want them all in Entrez. I tried using both NCBI and BioMart to convert the Ensembl IDs, but it seems that any way you cut it, around 3000 out of 20,000 genes don't have matches. After a little investigating, I discovered that these IDs are "retired" (example).
Should I hunt harder to match up these retired Ensembl IDs to their current equivalent Entrez ID? Or is it safe to assume that these are "fringe elements" whose status as gene was revoked, and it's okay to leave them out of my analysis?
Any insight is appreciated.
Thanks,
Maureen
Wouter is right. There are often legitimate reasons why we would change the ID of a gene, maybe we've split it into two genes, merged two together, but the gene would still exist. There are also cases where we have retired the genes because they're dodgy. It's worth investigating them, you'll probably gain some and lose some in the process though.