I was a little confused by this plink output im producing. It was my understanding that --indep-pairwise analysis using a cutoff value for r2 would identify all the SNPs in a .ped file with an r2 correlation value of >.7
$ ./plink --file input.plink --indep-pairwise 225 23 .7 --out plinkoutput1
SNPs with a r2 >.7 would have their IDs saved to a file 'prune.in' while those with a r2 < .7 would be saved to a second file called 'prune.out'. Now, if you want to get the tagging SNPs and a list of all the SNPs they tag, you run something along the lines of
$ plink --file input.plink --show-tags my_tags.txt --list-all --tag-r2 .7 --out mytags_and_what_they_tag
So it makes sense that you could use the prune.in list of SNPs for the list of tags right?
$ plink --file input.plink --show-tags input.prune.in --list-all --tag-r2 .7 --out mytags_and_what_they_tag
and in the output every SNP should appear as either a tag for something, or something that is being tagged by another snp. But when I look at the file:
SNP CHR BP NTAG LEFT RIGHT KBSPAN TAGS rs79687004 2 5501148 0 5501148 5501148 0 NONE rs1404257 2 5503200 1 5498227 5503200 4.973 rs10929518 rs61268182 2 5504924 2 5491746 5504924 13.178 rs72776977|rs955146 rs17356301 2 5506274 0 5506274 5506274 0 NONE
I see things like rs79687004. In its 'TAGS' column it has NONE. So its not serving as a tagging SNP for anything. Fair enough, but when I search for it in the file this is the ONLY instance of 'rs79687004'. Its not being listed in some other SNP's TAGS column either. Heres where I am confused: if everything in prune.in is a SNP that correlated to something else at an r2 > .7, wouldnt every single SNP either be identified as tagging something or tagged by something in this output? Ie shouldnt a SNP that is not tagging anything, in addition to having its on line in the output ALSO appear in some other line as a tagged SNP?
The only thing I can think as an explanation is that my scanning window used during --pairwise-indep was 225 SNPs, and this file was a collection of some distant loci in a single chromosome, so maybe there were instances where the --indep-pairwise window fell over a VERY distantly located set of SNPs and found they were correlated, so they went into the 'prune.in' file. But then, during the --show-tags analysis there is a 240kb default 'window' where SNPs further apart than that are not examined for tagging. but I find it hard to swallow that I would get so many lines where a SNP is not tagging OR tagged just because some comparisons won't be made the second time around in --show-tags.
What am I missing here? I've seen other peoples output from the same process posted with "None" in the TAGS column, but no one seems to bat an eye. I feel like I am overlooking something obvious here and I was hoping you could point it out to me.
Many thanks in advance!