I am comparing a couple of strains of a parasitic fungus, and the strains overall seem pretty similar, in a 3mb genome there are about 10k homozygous SNPs between these isolates. However, 2 isolates of interest share 400 of these SNPs.
I am interested in testing for enrichment, in other words, genes affected by these 400 SNPs, are they different then genes affected by the overall 10k SNPs?
I know how to do this in the simplest way when I have just a list_1 of genes and a subset list_2. However, here we do not have just lists, we also have amount of SNPs per gene. There are genes affected by 10 SNPs and some affected only by 1, surely this needs to be taken into account.
I usually do this using KOG/GO terms or KEGG pathways.
Can anyone propose an approach that would seem reasonable?