Empty ENSEMBL gene annotations

0

Entering edit mode

7.4 years ago

emblake ▴ 90

I'm using org.Hs.eg.db to annotatate ENSEMBL gene IDs from a read counts matrix, and after my list is created, I have ~3700 that have 'NA' for the annotation. If I keep these unannotated genes, perform DE testing in edgeR, and retrieve my topTags, I see quite a few of the unannotated genes in this list. Some of these genes are lincRNA (ENSG00000205959, ENSG00000260743, etc.) pseudogenes (ENSG00000236297, ENSG00000232901, etc), and antisense RNA (ENSG00000267734).

How do I deal with these genes? Should I filter to remove any genes that do not have annotations prior to normalization and DE testing or should I leave these in the analysis? Obviously, it makes adding annotations and generating figures cleaner, but I'm not sure what the convention is. Thanks.

rna-seq annotations R ensembl • 1.6k views

ADD COMMENT • link 7.4 years ago by emblake ▴ 90

0

Entering edit mode

They're just that, unannotated. I'd keep them, may represent novel findings.

ADD REPLY • link 7.4 years ago by pld 5.1k

0

Entering edit mode

Thanks. I figured as much, but I've seen research removing data that doesn't contain official gene symbols: https://f1000research.com/articles/5-1438/v1. Just curious what the consensus is on this type of filtering.

ADD REPLY • link 7.4 years ago by emblake ▴ 90

Login before adding your answer.