Hi, I am planning to run differential expression analysis and then enrichment analysis for an RNA-seq experiment from arabidopsis. The RNA-seq count table has TAIR tags/ids as rows. I was thinking that some of these tags are non-coding genes (e.g. micro RNA, etc) and so when converting TAIR ids to ENTREZ ids, around 80% are convertible (20% without associated entrez id). Since the purpose of the study is detecting DE genes and performing enrichment analysis (and for enrichment analysis most non-coding genes do not annotation available), is it wise to only keep the convertible tags and remove the rest (20% of features) from the beginning I mean before running DE analysis, etc?
It probably won't make much of a difference either way. Including all the genes might give you a very slightly more accurate normalization and dispersion estimates, but will give you a slightly worst multiple testing burden. Such genes will be automatically ignored in any enrichment analysis anyway.