I'm using org.Hs.eg.db to annotatate ENSEMBL gene IDs from a read counts matrix, and after my list is created, I have ~3700 that have 'NA' for the annotation. If I keep these unannotated genes, perform DE testing in edgeR, and retrieve my topTags, I see quite a few of the unannotated genes in this list. Some of these genes are lincRNA (ENSG00000205959, ENSG00000260743, etc.) pseudogenes (ENSG00000236297, ENSG00000232901, etc), and antisense RNA (ENSG00000267734).
How do I deal with these genes? Should I filter to remove any genes that do not have annotations prior to normalization and DE testing or should I leave these in the analysis? Obviously, it makes adding annotations and generating figures cleaner, but I'm not sure what the convention is. Thanks.