PLINK extracting more variants than in file when calculating LD (with --extract modifier).
0
0
Entering edit mode
3.2 years ago
raulghose • 0

EDIT: I tried the same command with PLINK 1.07 and it works fine, but significantly slower. Another issue is that I cannot get D' values using 1.07, I can only get the r2 values.

NOTE: I am very new to PLINK.

I was trying to use PLINK 1.9 to get D' values between all variants in my $snpfile (code below). The file contains 2770 variants but for some reason, the PLINK logs show that it is extracting 2771 variants. Due to this, the resultant table returned is not what I am expecting. Does anyone have an idea as to why this might be happening?

The command I used is as follows:

$plink \
   --bfile $gfile \
   --keep $indfile \
   --extract $snpfile \
   --r2 dprime\
   --ld-window $ldwin \
   --ld-window-kb 100000 \
   --ld-window-r2 0 \
   --out $ldmatrixfile

I extended the --ld-window-kb to include a very large region so I can get the pairwise-LD value between every SNP in my list. The $ldwin value equals the number of SNPs.

This is the log file output (I have removed any absolute paths):

PLINK v1.90b6.18 64-bit (16 Jun 2020)          www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to Chrom19/LD_mat219.log.
Options in effect:
  --bfile path/to/bed,bim,fam files
  --extract Chrom19/snps19.txt
  --keep Chrom19/controls19.txt
  --ld-window 2770
  --ld-window-kb 100000
  --ld-window-r2 0
  --out Chrom19/LD_mat219
  --r2 dprime

773942 MB RAM detected; reserving 386971 MB for main workspace.
190525 variants loaded from .bim file.
333531 people (0 males, 0 females, 333531 ambiguous) loaded from .fam.
Ambiguous sex IDs written to Chrom19/LD_mat219.nosex .
--extract: 2771 variants remaining.
--keep: 10000 people remaining.
Warning: At least 29994 duplicate IDs in --keep file.
Using up to 23 threads (change this with --threads).
Before main variant filters, 10000 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate in remaining samples is 0.991546.
2771 variants and 10000 people pass filters and QC.
Note: No phenotypes present.
--r2 dprime... done.          
Results written to Chrom19/LD_mat219.ld .
LD SNP PLINK • 1.4k views
ADD COMMENT
0
Entering edit mode

Are you sure the $snpfile contains that number of variants? What's the result of wc -l $snpfile?

ADD REPLY
0
Entering edit mode

Yes I have confirmed this. The file contains 2770 variants.

$ wc -l snps19.txt 
2770 snps19.txt
ADD REPLY
0
Entering edit mode

Are you sure your .bim variant IDs are unique?

ADD REPLY
0
Entering edit mode

There are some duplicates in the .bim file, but the separate list of variants that I am providing with the --extract modifier has no duplicates. So I assumed that PLINK would filter based on the list of SNPs I provide in the --extract command. Do you think this might be an issue?

Furthermore, if there were duplicates, wouldn't I expect a lower number of variants being analysed by PLINK instead of a higher number?

ADD REPLY
0
Entering edit mode

Your second paragraph is backwards. If one of the variant IDs in your —extract files appears twice in the .bim, and the other 2769 variants appear once each, —extract would select 2771 variants.

ADD REPLY

Login before adding your answer.

Traffic: 1488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6