Question: Convert vcf phased data to plink
0
gravatar for Nandini
7 months ago by
Nandini840
Nandini840 wrote:

I have a phased haplotype format vcf file that looks like this

##fileformat=VCFv4.0
##reference=human_b36_both.fasta
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NA12891 NA12892 NA12878 NA19239 NA19238 NA19240
22      47812545        rs5769818       A       G       .       PASS    GT      0|1     0|1     0|1     1|1     1|1     1|1
22      47812939        rs9616222       A       G       .       PASS    GT      0|1     0|1     0|1     1|0     1|1     1|1
22      47813002        rs5769819       A       G       .       PASS    GT      0|1     0|1     0|1     1|0     1|1     1|1
22      47813051        rs5769820       G       A       .       PASS    GT      1|0     1|0     1|0     1|0     1|1     1|1
22      47813163        rs5769821       A       G       .       PASS    GT      0|1     0|1     0|1     1|0     1|1     1|1

I do not have additional files or ped files to this.

I would like to calculate the identity by state (IBS) between all pairs of individuals - is there a way to convert this file into plink format or are there any tools that can take in vcf to calculate IBS ?

Thank you

ADD COMMENTlink modified 7 months ago by Gabriel R.2.7k • written 7 months ago by Nandini840

Yes, PLINK can read VCFs: https://www.cog-genomics.org/plink/1.9/input#vcf

You may want to check some of the other parameters that can be used when converting, such as:

  • --keep-allele-order
  • --vcf-idspace-to
  • --const-fid
  • --allow-extra-chr
  • --split-x

Kevin

ADD REPLYlink written 7 months ago by Kevin Blighe54k

thanks Kevin, I try to do that but end up wit a file that has 0 for each individual.

ADD REPLYlink written 7 months ago by Nandini840

Any log or error messages?

ADD REPLYlink written 7 months ago by Kevin Blighe54k

No, so this is what I did (playing around with a chunk of chr22)

##convert vcf to plink
./plink --vcf input_genotype.vcf --keep-allele-order  --allow-extra-chr 0 --make-bed --out MyData

##tried to calculate IBS using King
/king -b MyData.bed --ibs

The result from above is an empty file

Then I tried to convert bim bam bed to map and ped

./plink --bfile MyData --recode --tab --out test

The ped file is just 0

NA12891 NA12891 0       0       0       -9      0 0     0 0     0 0     0 0     0 0
ADD REPLYlink written 7 months ago by Nandini840

If you avoid using --make-bed and instead produce a plain text PLINK dataset, can you then see data in that?

ADD REPLYlink written 7 months ago by Kevin Blighe54k

still the same. I don't think PLINK can handle phased Haplotype file in the format 0|1, where 0 indicates ref and 1 is for the alt allele. Can it ?

ADD REPLYlink modified 7 months ago • written 7 months ago by Nandini840

It can handle phased, as I show in Step 5, here: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2 (the 1000 Genomes data is all phased). I will move my answer back to a comment. Perhaps the PLINK developer will pick it up later (in different time zone).

ADD REPLYlink written 7 months ago by Kevin Blighe54k
1

Ah, for now, I was able to sort my issue out using the R package SNPrelate. It takes in the vcf file as input, calculates IBS and then if one feels like it, it can also convert it to PLINK format !

ADD REPLYlink written 7 months ago by Nandini840
1
gravatar for Gabriel R.
7 months ago by
Gabriel R.2.7k
Danmarks Tekniske Universitet
Gabriel R.2.7k wrote:

You can try glactools, it is just one line:

glactools vcfm2acf --onlyGT  --fai /Data/reference/human_g1k_v37.fasta.fai  /tmp/biostars/test.vcf  | glactools acf2bplink - /tmp/biostars/test

You need to edit the genetic distance in the bim file and probably the fam files as well.

ADD COMMENTlink written 7 months ago by Gabriel R.2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1646 users visited in the last hour