I'm new to human statistical genetics so am a little lost.
Specifically, I would like to know what the most practical approach would be for me to tackle the following task:
Starting with a set of genomic regions (hg19) in bam format, how can I efficiently generate a set of SNPs (from 1000 Genomes, EUR) that are in LD (for example, r^2> 0.6) with each of these regions?
What I think I need to do are:
find all SNPs in the regions of interest
then
use software (eg Plink) to calculate pairwise LD's of all SNPs within (for example) 100kb of each of those SNPs
My questions are:
Am I on the right track? Is there and easier way? For instance, I have been considering SNAP and HaploReg but at least the web version of SNAP appears to be out of date.
What is the best (least error prone) way for me to achieve #1 above (ie, finding all SNP IDs in a given region? -- I'm completely baffled by how SNP IDs can change from build-to build
Thank you