Hello everyone,
I'm using plink v1.9 in order to test several r² cutoff from 0.1 to 0.99, however from 0.8 to 0.99 the same about of SNPs are extracted and ~450 000 SNPs (my total set is about 520 000 SNPs) are pruned. So it seems they all have a r² > 0.99.
I'm really new at this subject, does it seem believable for you ?
My SNPs were called with GATK and imputed with Beagle5 without reference haplotype. To do the pruning I'm running the following script :
!/bin/bash
Script to do pruning with different r2
feature 1 : file to prune (.vcf format)
feature 2 : where to store results
Create directory where the results will be stored
mkdir -p $2/Pruned
create chrom map
bcftools view -H $1 | cut -f 1 | uniq | awk '{print $0"\t"$0}' > chrom-map.txt
convert vcf to plink format
vcftools --gzvcf $1 --plink --out $2/Pruned/plink.data --chrom-map chrom-map.txt
pruning
for r2 in 0.99 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 do plink --file $2/Pruned/plink.data --indep-pairwise 200 10 $r2 --out $2/Pruned/pruned.$r2 --allow-extra-chr plink --file $2/Pruned/plink.data --extract $2/Pruned/pruned.$r2.prune.in --recode vcf --out $2/Pruned/pruned.$r2 --allow-extra-chr bgzip $2/Pruned/pruned.$r2.vcf done
Thanks by advice for your answers, Xillanne