Convert plink files to VCF: Reference allele file
2
3
Entering edit mode
8.7 years ago

I am trying to convert plink files to VCF by using following commands

#!/bin/sh
# 1. have plink binary to specify reference allele
plink --noweb --bfile $plink_file --reference-allele $ref_Allele_file --make-bed --out $plink_file_modified

# 2. create plinkseq project
pseq $pseq_project new-project

#3. load plink file into plink/seq
pseq $pseq_project load-plink --file $plink_file_modified --id $plink_file_modified

#4. write out vcf file
pseq $pseq_project write-vcf | gzip > $plink_file_modified.vcf.gz

I have site-positions-file which looks like as follows

7    7672918
7    4458845
7    3013804
7    4518570
7    7216328
7    7523768

I need reference-allele-file like this

7    7672918   C
7    4458845   C
7    3013804   T
7    4518570   T
7    7216328   C
7    7523768   C

Kindly help me

SNP genome • 47k views
ADD COMMENT
14
Entering edit mode
8.7 years ago

There are several older threads on this, like

How To Convert .Ped To .Vcf File

or

How To Convert Vcf File To Plink Ped Format?

I guess the easiest way to circumvent your few steps and the use of pseq is to just download the PLINK 1.9 beta and use the recode function

plink --file your_ped_map_input --recode vcf
ADD COMMENT
0
Entering edit mode

It worked really fine. Thanks Philipp!

ADD REPLY
1
Entering edit mode

Note that plink doesn't know the reference sequence, so basically guesses the ref from the alleles.

ADD REPLY
4
Entering edit mode
7.7 years ago
willgilks ▴ 360

I found a solution just now to the differing reference alleles.

My plan was:

  1. Convert to binary plink format, and at the same time, keep only variants with 0 missing calls. I used plink for this.
  2. Convert back to vcf, also using plink.
  3. Use GATK to combine the vcfs.

As suggested above by Dan, plink automatically sets the major (common) allele as the reference allele for each population when generating the bim (map) files. I preventing this problem by using the --keep-allele-order option, in both plink commands, as shown in the code below. (Note this for a model organism, not human, and suggestions for improvement are welcome.)

## Convert to plink, and filter by call rate. __

for i in *vcf;do
    plink1.9 --allow-extra-chr --allow-no-sex \
        --vcf ${i}  \
        --make-bed \
        --geno 0 \
        --chr chr2L, chr2R, chr3L, chr3R \
        --keep-allele-order \
        --out ${i%.vcf}
                done;

## Convert back to vcf (I couldn't get the input file names right so they're written in full) . ___ 

for i in hclone_lhm_only_dbSNP_biSNP dgrp2_dm6_dbSNP; do
    plink1.9 --allow-extra-chr --allow-no-sex \
    --bfile ${i} \
    --recode vcf \
    --keep-allele-order \
        --out ${i}.bas
            done;

## Combine the two vcfs. ___
 java -jar ~/SOFTWARE/GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar \
        -R ${refseq} \
        -T CombineVariants \
            -V ${vcf1%.vcf}.bas.vcf \
            -V ${vcf2%.vcf}.bas.vcf \
                --unsafe LENIENT_VCF_PROCESSING \
                --genotypemergeoption UNSORTED \
                    -o comb.vcf

I hope this helps someone.

ADD COMMENT

Login before adding your answer.

Traffic: 2115 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6