The GTF file should typically contain annotation information on the gene type. Select for example only
protein_coding. That is how it is called in the GENCODE files, not sure what the exact term would be in RefSeq GTF. I typically remove a couple of RNA species before doing DEG, which is all smallRNAs (micro, sno, sn..., because they are not well-captured in standard RNA-seq and therefore not reliable) and TECs (to be experimentally confirmed) genes. What is left is (at least imho) the meaningful part of a standard RNA-seq experiment which then goes into dispersion estimation and model fitting. What you could probably do is to remove further gene types after the main DEG but prior to multiple testing correction if you have the feeling that you lose too many significant genes being eliminated by FDR. The FDR is definitely affected by the number of tested genes, and reducing the number of tests might make sense. From a "strategical" standpoint you could probably remove everything from which you a priori know that you will not benefit from finding these genes significant. That would be genes such as non-protein coding transcripts of (unknown) function. Finding these as significant might potentially harbour an interesting biological finding but if it is not the focus of your study or you are not willing to chase something like lincRNAs one could probably filter them out to reduce multiple testing burden. I am by no means a statistician so feel free to comment.