Question

Default Plink Window size?

0

Entering edit mode

5.1 years ago

RNAseqer ▴ 260

Hello all,

I have been looking at tagging SNPs based on r2 values > .7 and using a variety of plink window sizes (50 SNP window, 225 SNP window, 500 SNP window etc, all with 10% increments). The .ped files I am scanning are chunks of whole chromosome files from the 10000 genomes project.

Of course bigger window sizes mean slower calculations. As such, I was wondering what the default, tried-and-true, standard plink widow size is in the literature when looking for tagging SNPs, if there is one at all. It certainly seems as though increasing window size helps too a point, but then returns less and less new finds. Since I'm a bit pressed for time, I thought I'd ask this forum before launching an exhaustive survey of the literature.

In case it is of interest to anyone considering the trade-offs of bigger window sizes at slower analyses, here is a snippet of what Im seeing:

    Chr4    Chr5    Chr6    Chr7    Chr8
50  0.211371226 0.20716799  0.181987946 0.219615699 0.175284882
225 0.17404574  0.17294384  0.143189064 0.182387517 0.144435708
500 0.170798296 0.170307534 0.136285545 0.178729012 0.140857786
5000    0.170303995 0.169839802 0.131607191 0.17781175  0.139813532

The numbers within the chart are, if you multiple by 100, the % of SNPs from the dataset that have NO correlation > .7 to any other SNP. Essentially the lone-wolf SNPs which will not require a tag. (I should mention I also filtered based on MAF prior to this scan in case those percentages seem weird for unfiltered SNP data)

I'm using the --show-tags and --show-all commands to get tags and their target SNPs after all is said and done. I'm thinking of using the 225 or 500 SNP windows with 23 and 50 SNP increments respectively in my final analysis. Would that be sufficient? Insufficient? Or overkill? Im not trying to find literally every correlation > .7, but just trying to make sure there are sufficient unique markers left over in a list of SNPs of interest.

plink window SNP • 1.9k views

ADD COMMENT • link updated 5.1 years ago by Kevin Blighe 87k • written 5.1 years ago by RNAseqer ▴ 260

score 1 · Answer 1 · 2019-04-05

Hey, I really do not believe there is any standard for this. Apart from anything else, the small summary table that you have produced is what one would expect for increasing window size.

I would do some reading on tag SNP selection to see what others have used, and then go by that. If you have time, then I would also read up on the relationships between r-squared, haploblock size, and recombination rates.

In the past, I used Tagger (implemented in HaploView) for tag SNP selection, and I just used the default settings.

'Hurried' analyses have the tendency to run aground, I have found.

Kevin