I have a custom list of 100 genes that I manually curated to obtain the full CDS and I would like to make differential expression (DE) analysis between samples for this very subset. I now I cannot simply map all the reads onto this subset and perform DE analysis because I would have normalization bias (using DESeq2 or edgeR), so I need to map all the reads on the whole genome.
Fortunately, I also have the raw sequence of a genome (multifasta file) as long as an automatic annotation – and the corresponding GFF file. The problem is that this annotation is not good enough of the 100 curated genes.
My plan was (1) run BLAT to get the exact genomic coordinates of my manually curated set of genes (2) merge the newly obtained GFF with the first (automatic and non-curated) one with Cufflinks gffcompare or and (3) run DESeq2 using this new annotation.
Would any of you have any suggestion regarding this protocol or any alternative tools to suggest?