Correct command line syntax for plink2
1
0
Entering edit mode
3.4 years ago
dec986 ▴ 370

Hello,

I'm running PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020) and I cannot get the command line syntax correct for LD pruning.

I've been reading https://avikarn.com/2019-07-30-prunning/ and https://www.cog-genomics.org/plink/2.0/ld but am confused about what I've read there. There are no examples in the manual pages, which isn't very helpful.

First, I convert vcf files thus vcftools --gzvcf $vcf --plink --out $dir/$id

I've tried

~/Scripts/plink2 --noweb --file plink/MDMNFYMQ --indep 50 5 2 --out MDMNFYMQ.indep which I got from the tutorial

but this gives the error: Error: Unrecognized flag ('--file'). which is strange, so I tried

~/Scripts/plink2 --noweb plink/MDMNFYMQ --indep-pairwise 50 5 2 --out MDMNFYMQ.indep

but then I get

Error: Invalid --ndep-pairwise r^2 threshold '2'. For more info, try "plink2 --help <flag name>" or "plink2 --help | more

I'm totally stuck.

How can I run LD pruning with plink2?

plink2 • 2.2k views
ADD COMMENT
0
Entering edit mode
3.4 years ago
  1. Use plink 1.9 for this. --file and --indep are not implemented in plink 2.0 yet.
  2. With both plink 1.9 and plink 2.0, VCF files should be imported directly with --vcf. vcftools --plink is very inefficient and lossy in comparison.
ADD COMMENT
0
Entering edit mode

I'm having a really hard time getting all of the plink input files ready.

/Scripts/plink1.9/plink --vcf genetic-data/ZZFNMDMF.vcf.gz --out ZZFNMDMF --make-founders --make-bed

generates .nosex, .log, .bed, .fam, .bin files. However, I cannot generate the ped file that is also required:

703404669@bioitutil2:~/covid_study2065$ ~/Scripts/plink1.9/plink --vcf genetic-data/ZZFNMDMF.vcf.gz --out ZZFNMDMF --recode compound-genotypes
PLINK v1.90b6.18 64-bit (16 Jun 2020)          www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to ZZFNMDMF.log.
Options in effect:
  --out ZZFNMDMF
  --recode compound-genotypes
  --vcf genetic-data/ZZFNMDMF.vcf.gz

128738 MB RAM detected; reserving 64369 MB for main workspace.
--vcf: ZZFNMDMF-temporary.bed + ZZFNMDMF-temporary.bim + ZZFNMDMF-temporary.fam
written.
661125 variants loaded from .bim file.
1 person (0 males, 0 females, 1 ambiguous) loaded from .fam.
Ambiguous sex ID written to ZZFNMDMF.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.925025.
661125 variants and 1 person pass filters and QC.
Note: No phenotypes present.
Error: --recode compound-genotypes cannot be used with multi-character allele
names.

Is there an alternative to Plink that doesn't require so much error-prone preparation?

ADD REPLY
0
Entering edit mode
  1. No .ped file is required (and .ped files should almost always be avoided in 2020 since they're horribly inefficient; this is why --file has been deliberately excluded from all plink 2.0 alpha builds). Use --bfile instead of --file to load the .bed + .bim + .fam fileset.
  2. With that said, LD pruning (and several other standard QC operations) requires a dataset with at least ~50 samples. It is not applicable to a single-person dataset.
ADD REPLY

Login before adding your answer.

Traffic: 1787 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6