Question: determining appropriate window size for identifying highly differentiated SNPs
0
gravatar for Ana
2.7 years ago by
Ana170
Ana170 wrote:

Hi al, I am trying to identify highly differentiated SNPs via window approach! I have 2 questions: first shall I try sliding window or adjacent windows? I wonder if there is any reference to help me to figure out which window size should I try?

I would appreciate if you know any appropriate reference to point me in that direction! Thanks

genome-scan window-size • 1.9k views
ADD COMMENTlink modified 2.7 years ago by Kevin Blighe60k • written 2.7 years ago by Ana170
1
gravatar for Kevin Blighe
2.7 years ago by
Kevin Blighe60k
Kevin Blighe60k wrote:

I believe that you are referring to linkage disequilibrium (LD) here.

People generally go for a sliding window of 50 SNPs per window and shifting the window by 5 SNPs each time, and then calculate (per window) the variance inflation factor (VIF) and LD for each batch of SNPs. LD is measured in different ways - take a look here at this Biostars thread: linkage disequilibrium: difference between D' and r-squared

PLINK can be used to measure this for you, assuming your data is in PLINK format: https://www.cog-genomics.org/plink/1.9/ld

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Kevin Blighe60k

I'm curious why do you favour a sliding window approach, this should lead to a lot of overlapping results wouldn't adjacent windows be more efficient?

ADD REPLYlink written 2.6 years ago by James Reeve100

Yes, it generates more information than the adjacent window approach, however, the adjacent window approach has a major flaw in that it only looks at blocks and therefore won't give me information on linkage disequilibrium (LD) between the blocks.

What if I have SNPs ranging from 1-20 and I analyse these in adjacent block sizes of 5 [SNPs]. I will not get information on LD between SNPs 1-5 and those in each other block. If I use a sliding window size of 5 SNPs and shift the window by 1 SNP each time, however, I will get a more continuous feel of LD across these 20 SNPs.

The ideal situation is to actually use a variable-sized sliding window based on SNP density in relation to genomic distance.

ADD REPLYlink written 2.6 years ago by Kevin Blighe60k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1152 users visited in the last hour