Hello everyone!
I want to perform the pathway overrepresentation analysis using the SNP dataset. I got over 100 SNPs significantly associated with the trait of interest and I want to find the genes that are potentially related to these SNPs and include them into pathway overrepresentation analysis. I currently cannot identify the threshold for distance at which should be considered the genes to include in the pathway overrepresentation analysis.
I currently consider three options:
1) Pick the SNPs that are close to the genes at SnpEff. By default settings, it is 5kb upstream and downstream.
2) Basing on SNP data calculate average LD decay across all chromosomes and choose the average distance with a strong LD, for example, r2 > 0.5 and use it as a threshold.
3) Use the threshold that was previously utilized for the identification of genes of interest for the organism under study (it was 100kb).
Please share your experience and suggest which to choose!