How does reference genome/transcriptome affect gene expression significance?
8 weeks ago
sbitterw ▴ 10

I am examining the impact of stressors on the gene expression of corals using TagSeq. This method relies on a reference transcriptome/genome to identify genes from transcriptome data. Fortunately, there are quite a few transcriptomes/genomes for my organism. However, the number of genes recorded in those reference databases can vary: one genome had ~30,000 genes while another had ~60,000 genes.

In DESeq2 this greatly impacted the analysis with the database with fewer reference genes resulting in more differentially expressed genes. Using the large genome, no genes were differentially expressed.

Am I right in thinking that the number of genes in the dataset impacts the adjusted p-value accounting for FDR? What should I do to address this disparity between genomes? Should I simply rely on a reference transcriptome instead?

