Question

PLINK Recoding Issue

1

Entering edit mode

7.1 years ago

reds.nik ▴ 40

Good evening, I working on whole genome seq. data and I am struggling a bit with plink. After having merged all my VCF file into a plink .bed file, I need to exclude the SNPs which are not in HW equilibrium. By merging many whole genome seq VCF files my plink file is full of NA, so I need to encode my SNPs as 0/1/2 (to set the missing as homozygous for the reference allele) before verifying the HW condition. After that I need to recode my file as traw file (as an output from plink --recode A-transpose, which has the individuals in columns and a line for each SNP). I would need something like recode12 --fill-missing-a2, I guess, but recode12 output just a ped (or a tped) file, which is not what I need. Does anyone know a possible solution? In few words I need to encode my plink .bed file as 0/1/2 (2 stading for homozygosity for the minor allele, and 0 to fill all the NA), in order to check the HW eq. and then convert it to a matrix with an individual fro each column and a SNP for each line). Thanks

genome • 4.6k views

ADD COMMENT • link updated 6.7 years ago by Dan D 7.4k • written 7.1 years ago by reds.nik ▴ 40

0

Entering edit mode

Did u try th recode A-transpose after the recode12 --fill-missing-a2?

ADD REPLY • link 7.0 years ago by qlouredo • 0

0

Entering edit mode

Thanks stolarek.ir for your answer. Your solution seems interesting and I will definitely try it next time. However working on quite a big set of whole genome sequencing file I didn't have the possibility (in term of time and disk space) to genotype all the vcf against all the positions present in all the samples and we decided to move forward at this stage without filtering out SNPs out of HW equilibrium. But thanks anyway for your solution.

ADD REPLY • link 6.7 years ago by reds.nik ▴ 40

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY • link 6.7 years ago by GenoMax 144k

score 2 · Accepted Answer · 2017-08-07

the way I do it is to first genotype the vcf files I have with a common set of SNPs (in other words genotype against a list of SNP positions, so that every individual genotyped is going to have the same number of markers).

bedtools intersect -a my.vcf -b bed_present_snps -wa -header > out.vcf

the annotate the file:

cat my_out.vcf | vcf-annotate -a vcf_annotation_file_sorted.gz -c CHROM,FROM,TO,ID,-,-,- > annotated.vcf

and move it to plink format:

./plink --double-id --vcf my.vcf --recode --out my_plink

from there it's just converting to binary and merging with other files / filtering:

./plink --file my_plink --make-bed --out plink_binary

./plink --file file_with_all_snps --bmerge my_plink_binary --make-bed --out plink_merged

now the missing markers should already be set as 0 0, and you can recode however you want to suit your liking