Question: PLINK Recoding Issue
gravatar for reds.nik
3.7 years ago by
reds.nik40 wrote:

Good evening, I working on whole genome seq. data and I am struggling a bit with plink. After having merged all my VCF file into a plink .bed file, I need to exclude the SNPs which are not in HW equilibrium. By merging many whole genome seq VCF files my plink file is full of NA, so I need to encode my SNPs as 0/1/2 (to set the missing as homozygous for the reference allele) before verifying the HW condition. After that I need to recode my file as traw file (as an output from plink --recode A-transpose, which has the individuals in columns and a line for each SNP). I would need something like recode12 --fill-missing-a2, I guess, but recode12 output just a ped (or a tped) file, which is not what I need. Does anyone know a possible solution? In few words I need to encode my plink .bed file as 0/1/2 (2 stading for homozygosity for the minor allele, and 0 to fill all the NA), in order to check the HW eq. and then convert it to a matrix with an individual fro each column and a SNP for each line). Thanks

genome • 2.6k views
ADD COMMENTlink modified 3.3 years ago by Dan D7.2k • written 3.7 years ago by reds.nik40

Did u try th recode A-transpose after the recode12 --fill-missing-a2?

ADD REPLYlink written 3.6 years ago by qlouredo0

Thanks for your answer. Your solution seems interesting and I will definitely try it next time. However working on quite a big set of whole genome sequencing file I didn't have the possibility (in term of time and disk space) to genotype all the vcf against all the positions present in all the samples and we decided to move forward at this stage without filtering out SNPs out of HW equilibrium. But thanks anyway for your solution.

ADD REPLYlink written 3.3 years ago by reds.nik40

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLYlink written 3.3 years ago by GenoMax96k
gravatar for
3.6 years ago by
stolarek.ir660 wrote:

the way I do it is to first genotype the vcf files I have with a common set of SNPs (in other words genotype against a list of SNP positions, so that every individual genotyped is going to have the same number of markers).

bedtools intersect -a my.vcf -b bed_present_snps -wa -header > out.vcf

the annotate the file:

cat my_out.vcf | vcf-annotate -a vcf_annotation_file_sorted.gz -c CHROM,FROM,TO,ID,-,-,- > annotated.vcf

and move it to plink format:

./plink --double-id --vcf my.vcf --recode --out my_plink

from there it's just converting to binary and merging with other files / filtering:

./plink --file my_plink --make-bed --out plink_binary

./plink --file file_with_all_snps --bmerge my_plink_binary --make-bed --out plink_merged

now the missing markers should already be set as 0 0, and you can recode however you want to suit your liking

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by stolarek.ir660
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1988 users visited in the last hour