I am doing a DGE analysis of a total RNAseq dataset of 2 timepoints (5 reps each). I am particularly interested in looking for changes in expression of 1000 genes.
Currently, I have done the analysis by analysing all genes and then picking the 1000 genes I am interested in. However, my PI has suggested that I could try doing the the analysis by doing DGE on just the 1000 genes. Theoretically, this should improve the statistical significance since there would be minimal adjustments for multiple hypothesis testing.
Is this an advisable way of doing the analysis? Since differential expression levels are fit to a negative binomial distribution (in the case of DESeq2), wouldn't this just mean most of the 1000 genes I input would end up not being differentially expressed?
Edit: We arrived at the list of 1000 genes as we were interested particularly in genes coding small proteins. Hence, we searched Uniprot for human proteins with a maximum length of 100 amino acids.