I am attempting to carry out GO term enrichment analysis on set of differentially methylated genomic regions (DMRs). The program I am attempting to do this in is DAVID.
My data is derived from reduced-representation bisulfite sequencing (RRBS) of liver tissue. Although I am working on a non-model species, there is relatively close reference genome available, and I have used this to annotate the DMRs where they overlap with known genes.
I therefore have a gene list that I would like to use in GO term enrichment analysis and explore other aspects of functional annotation. However, when carrying out these sorts of analyses in DAVID, a background list is required for statistical comparison. This usually involves taking a list of the genes known from that particular reference genome (e.g. 30,000 in humans) and then seeing if any particular gene category are over-represented in my gene list in comparison to the genomic background. With the RRBS data, I am only sequencing a subset of the available genome and genes in my gene list can only come from this subset, therefore to ask if genes categories are over-represented in my DMR data set by comparing against the entire number of genes known from that organism does not really make much sense to me.
Does anyone know how to generate an appropriate background list for RRBS functional annotation analysis? I guess this might overlap with other reduced-representation sequencing techniques such as RAD-seq. Alternatively, does anyone know of any other methods/programs for carrying out GO term enrichment analysis that takes into account the biased sampling of the genome involved in RRBS?
Thanks in advance for any advice you can give,