Question: plink produces does pruning according to log file but prune.in is full of dots
3
gravatar for ana.agapito.v
9 months ago by
ana.agapito.v20 wrote:

Hi,

I'm running a big SNP database (160GB vcf file) I pruned with plink;

plink --vcf SNPs_clean1.vcf --allow-extra-chr --indep-pairwise 50 10 0.1 --out SNP_50_10_01

It starts running producing the temporary files and the output files prune.in and prune.out). In the log file it shows having filtered the variants, 49595998 of 57089576 variants removed according to log file attached below. When I count the number of lines of the prune.in file, it does match the number of variants removed.

The problem is that when I open the file it's just a bunch of dots (one dot per line). The same happens for the prune.out file. I've looked for similar errors in this page but I've only seen blank prune.in and prune.out files so I hope this isn't a repeated question.

Thanks in advance for your help, Ana

Log file:

PLINK v1.90b5.3 64-bit (21 Feb 2018)

Options in effect:

  --allow-extra-chr
  --indep-pairwise 50 10 0.1
  --out SNP_50_10_01
  --vcf SNPs_camsud_clean1.vcf

Hostname: XXXXXX
Working directory: /home/aagapito/POTW/mapeo_vicugna/cardiff
Start time: Tue Nov 13 08:23:34 2018

Random number seed: 1542108214
257764 MB RAM detected; reserving 128882 MB for main workspace.
--vcf: SNP_50_10_01-temporary.bed + SNP_50_10_01-temporary.bim +
SNP_50_10_01-temporary.fam written.

57089576 variants loaded from .bim file.
56 people (0 males, 0 females, 56 ambiguous) loaded from .fam.
Ambiguous sex IDs written to SNP_50_10_01.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 56 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.976739.
57089576 variants and 56 people pass filters and QC.
Note: No phenotypes present.
Pruned 532 variants from chromosome 27, leaving 67.
Pruned 438334 variants from chromosome 28, leaving 68336.
Pruned 3957 variants from chromosome 29, leaving 364.
Pruned 281964 variants from chromosome 30, leaving 41240.
Pruned 247860 variants from chromosome 31, leaving 38534.
Pruned 201756 variants from chromosome 32, leaving 29178.
(the list goes on...)
**Pruning complete.  49595998 of 57089576 variants removed.**
Marker lists written to SNP_50_10_01.prune.in and SNP_50_10_01.prune.out .
End time: Tue Nov 13 08:32:21 2018
snp plink genome software error • 861 views
ADD COMMENTlink modified 5 months ago by cicindel10 • written 9 months ago by ana.agapito.v20
3
gravatar for chrchang523
8 months ago by
chrchang5235.5k
United States
chrchang5235.5k wrote:

This is because the .prune.in and .prune.out files contain variant IDs, but your VCF has all variant IDs set to ‘.’.

plink 2.0’s --set-all-var-ids flag provides one way to assign IDs.

ADD COMMENTlink modified 8 months ago by zx87547.9k • written 8 months ago by chrchang5235.5k
2
gravatar for yoce_pf
8 months ago by
yoce_pf40
Universidad Nacional Autónoma de México - University of Bath
yoce_pf40 wrote:

I solved the same problem on this way:

1. First, I obtained the list of variants in linkage desequilibrium (r2 coefficient)

plink --noweb --vcf file.vcf --no-sex --maf 0.05 --recode --allow-extra-chr --r2 --ld-window-kb 1 --ld-window 1000 --ld-window-r2 0 --out file_vcf_ld

With the output.ld, I filtered the last column (r2 coefficient >= 0.7).

awk '{if($NF>=0.7) print$0}' file_vcf_ld.ld > filter_snp_ld.txt

Create a list with those positions (based on column 1, 2, 4 and 5)

2. Remove those positions from the original file.vcf

grep -Fwvf filter_snp_ld.txt file.vcf > file_filter_ld.vcf

(It may take a long time)

3. With the new file.vcf perform a second filter based on Minor Allele Frequency (--maf)

plink --noweb --vcf file_filter_ld.vcf --allow-extra-chr --no-sex --maf 0.05 --out new_file_filtered_ld_maf --make-bed

I hope this solution could help you

ADD COMMENTlink modified 8 months ago by zx87547.9k • written 8 months ago by yoce_pf40
1
gravatar for cicindel
5 months ago by
cicindel10
cicindel10 wrote:

I wrote a script to solve exactly this, you can find it here as a gist. In the script I also explain what's going on.

ADD COMMENTlink modified 5 months ago by genomax70k • written 5 months ago by cicindel10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1659 users visited in the last hour