Convert vcf phased data to plink
2
0
Entering edit mode
4.8 years ago
NB ▴ 960

I have a phased haplotype format vcf file that looks like this

##fileformat=VCFv4.0
##reference=human_b36_both.fasta
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA12891 NA12892 NA12878 NA19239 NA19238 NA19240
22      47812545        rs5769818       A       G       .       PASS    GT      0|1     0|1     0|1     1|1     1|1     1|1
22      47812939        rs9616222       A       G       .       PASS    GT      0|1     0|1     0|1     1|0     1|1     1|1
22      47813002        rs5769819       A       G       .       PASS    GT      0|1     0|1     0|1     1|0     1|1     1|1
22      47813051        rs5769820       G       A       .       PASS    GT      1|0     1|0     1|0     1|0     1|1     1|1
22      47813163        rs5769821       A       G       .       PASS    GT      0|1     0|1     0|1     1|0     1|1     1|1

I do not have additional files or ped files to this.

I would like to calculate the identity by state (IBS) between all pairs of individuals - is there a way to convert this file into plink format or are there any tools that can take in vcf to calculate IBS ?

Thank you

vcf plink phased-haplotype • 3.8k views
ADD COMMENT
1
Entering edit mode
4.8 years ago
Gabriel R. ★ 2.9k

You can try glactools, it is just one line:

glactools vcfm2acf --onlyGT  --fai /Data/reference/human_g1k_v37.fasta.fai  /tmp/biostars/test.vcf  | glactools acf2bplink - /tmp/biostars/test

You need to edit the genetic distance in the bim file and probably the fam files as well.

ADD COMMENT
0
Entering edit mode
4.8 years ago

Yes, PLINK can read VCFs: https://www.cog-genomics.org/plink/1.9/input#vcf

You may want to check some of the other parameters that can be used when converting, such as:

  • --keep-allele-order
  • --vcf-idspace-to
  • --const-fid
  • --allow-extra-chr
  • --split-x

Kevin

ADD COMMENT
0
Entering edit mode

thanks Kevin, I try to do that but end up wit a file that has 0 for each individual.

ADD REPLY
0
Entering edit mode

Any log or error messages?

ADD REPLY
0
Entering edit mode

No, so this is what I did (playing around with a chunk of chr22)

##convert vcf to plink
./plink --vcf input_genotype.vcf --keep-allele-order  --allow-extra-chr 0 --make-bed --out MyData

##tried to calculate IBS using King
/king -b MyData.bed --ibs

The result from above is an empty file

Then I tried to convert bim bam bed to map and ped

./plink --bfile MyData --recode --tab --out test

The ped file is just 0

NA12891 NA12891 0       0       0       -9      0 0     0 0     0 0     0 0     0 0
ADD REPLY
0
Entering edit mode

If you avoid using --make-bed and instead produce a plain text PLINK dataset, can you then see data in that?

ADD REPLY
0
Entering edit mode

still the same. I don't think PLINK can handle phased Haplotype file in the format 0|1, where 0 indicates ref and 1 is for the alt allele. Can it ?

ADD REPLY
0
Entering edit mode

It can handle phased, as I show in Step 5, here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2 (the 1000 Genomes data is all phased). I will move my answer back to a comment. Perhaps the PLINK developer will pick it up later (in different time zone).

ADD REPLY
1
Entering edit mode

Ah, for now, I was able to sort my issue out using the R package SNPrelate. It takes in the vcf file as input, calculates IBS and then if one feels like it, it can also convert it to PLINK format !

ADD REPLY

Login before adding your answer.

Traffic: 1536 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6