How to resolve NAs when annotating a diff gene list with org.Hs.eg.db terms?
0
1
Entering edit mode
3.5 years ago
TonyCN ▴ 60

I'm having issues resolving NAs whilst trying to annotate a list of diff genes with entrez IDs using ensembl IDs. I would be surprised if this hadn't been asked before, but finding answers and suggestions is half of the battle if you're not quite sure what to look for. Main questions in bold.

I've followed the RNA seq DESeq2 Bioconductor tutorial/outlined steps. The reference transcriptome is Homo_sapiens.GRCh38.v100 - I combined both coding and non-coding. I have list of diff genes for several compound-treatment experiments. I need entrez IDs for a chemistry process downstream from here, so I ran what you would expect:

library("AnnotationDbi")
library("org.Hs.org.db")
resAmi$entrez <- mapIds(org.Hs.eg.db,
                     keys=ens.str,
                     column="ENTREZID",
                     keytype="ENSEMBL",
                     multiVals="first")

A good proportion of each diff genes are given an entrez ID of NA. Firstly, why are there NAs? Something to do with ensembl dropping gene mappings after a particular version of their DB? A random comment I found on Biostars!

Secondly, I decided to try and annotate with EnsDb.Hsapiens.v86; a complete stab in the dark in an effort to understand more. This resolved some of the NAs but many of the entrez values I once had with org.Hs.eg.db are now different. In fact, I can see how two ensembl ID entries share the same entrez ID depending on the annotation DB. Which annotation DB is appropriate?

Here's just a snippet of what I'm seeing (sorry about the formatting, the two entrez IDs in question are in bold):

Symbol | Entrez | Entrez_ens | txbiotype | LFC

NA | NA | 2920 | protein_coding | -26.4976570057622

TAS2R3 | 50831 | 1417 | protein_coding | -16.5810022683443

NA | NA | 50831 | protein_coding | 17.5184870830614

NA | NA | 102724652 | protein_coding | 14.3289350041311

ARHGAP11B | 89839 | NA | processed_pseudogene | -16.7365692557264

The two entrez columns came from the sources: Entrez = org.Hs.eg.db. Entrez_ens = EnsDb.Hsapiens.v86.

As much information you can spare is greatly appreciated.

RNA-Seq ontologies GO annotations • 858 views
ADD COMMENT

Login before adding your answer.

Traffic: 2898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6