Question: plink produces does pruning according to log file but is full of dots
21 months ago by
ana.agapito.v50 wrote:


I'm running a big SNP database (160GB vcf file) I pruned with plink;

plink --vcf SNPs_clean1.vcf --allow-extra-chr --indep-pairwise 50 10 0.1 --out SNP_50_10_01

It starts running producing the temporary files and the output files and prune.out). In the log file it shows having filtered the variants, 49595998 of 57089576 variants removed according to log file attached below. When I count the number of lines of the file, it does match the number of variants removed.

The problem is that when I open the file it's just a bunch of dots (one dot per line). The same happens for the prune.out file. I've looked for similar errors in this page but I've only seen blank and prune.out files so I hope this isn't a repeated question.

Thanks in advance for your help, Ana

Log file:

PLINK v1.90b5.3 64-bit (21 Feb 2018)

Options in effect:

  --indep-pairwise 50 10 0.1
  --out SNP_50_10_01
  --vcf SNPs_camsud_clean1.vcf

Hostname: XXXXXX
Working directory: /home/aagapito/POTW/mapeo_vicugna/cardiff
Start time: Tue Nov 13 08:23:34 2018

Random number seed: 1542108214
257764 MB RAM detected; reserving 128882 MB for main workspace.
--vcf: SNP_50_10_01-temporary.bed + SNP_50_10_01-temporary.bim +
SNP_50_10_01-temporary.fam written.

57089576 variants loaded from .bim file.
56 people (0 males, 0 females, 56 ambiguous) loaded from .fam.
Ambiguous sex IDs written to SNP_50_10_01.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 56 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.976739.
57089576 variants and 56 people pass filters and QC.
Note: No phenotypes present.
Pruned 532 variants from chromosome 27, leaving 67.
Pruned 438334 variants from chromosome 28, leaving 68336.
Pruned 3957 variants from chromosome 29, leaving 364.
Pruned 281964 variants from chromosome 30, leaving 41240.
Pruned 247860 variants from chromosome 31, leaving 38534.
Pruned 201756 variants from chromosome 32, leaving 29178.
(the list goes on...)
**Pruning complete.  49595998 of 57089576 variants removed.**
Marker lists written to and SNP_50_10_01.prune.out .
End time: Tue Nov 13 08:32:21 2018
20 months ago by
United States
chrchang5237.1k wrote:

This is because the and .prune.out files contain variant IDs, but your VCF has all variant IDs set to ‘.’.

plink 2.0’s --set-all-var-ids flag provides one way to assign IDs.

20 months ago by
Universidad Nacional Autónoma de México - University of Bath
yoce_pf50 wrote:

I solved the same problem on this way:

1. First, I obtained the list of variants in linkage desequilibrium (r2 coefficient)

plink --noweb --vcf file.vcf --no-sex --maf 0.05 --recode --allow-extra-chr --r2 --ld-window-kb 1 --ld-window 1000 --ld-window-r2 0 --out file_vcf_ld

With the output.ld, I filtered the last column (r2 coefficient >= 0.7).

awk '{if($NF>=0.7) print$0}' file_vcf_ld.ld > filter_snp_ld.txt

Create a list with those positions (based on column 1, 2, 4 and 5)

2. Remove those positions from the original file.vcf

grep -Fwvf filter_snp_ld.txt file.vcf > file_filter_ld.vcf

(It may take a long time)

3. With the new file.vcf perform a second filter based on Minor Allele Frequency (--maf)

plink --noweb --vcf file_filter_ld.vcf --allow-extra-chr --no-sex --maf 0.05 --out new_file_filtered_ld_maf --make-bed

I hope this solution could help you

17 months ago by
cicindel20 wrote:

I wrote a script to solve exactly this, you can find it here as a gist. In the script I also explain what's going on.

