Question: How to convert vcf format with 0/0 0/1 1/1 to SNP
1
gravatar for taoyan
2.2 years ago by
taoyan10
ZheJiang University, Hangzhou, China
taoyan10 wrote:

Hello I have a vcf file like this:

#CHROM     POS     ID   REF ALT QUAL FILTER INFO    FORMAT  R4157   R4158   R4163
chr7    30902031    .    C   A   .      .    PR       GT     0/0     0/0     0/0

now I want to convert it to the format like that:

#CHROM    POS      ID   REF ALT QUAL    FILTER  INFO    FORMAT  R4157   R4158   R4163
chrC07  30902031    .    C   A   .         .     PR       GT     C        C       C

So could anyone know to to make it ?Thanks !!!

snp sequence • 1.7k views
ADD COMMENTlink modified 2.2 years ago by JC11k • written 2.2 years ago by taoyan10
1

What does C mean for an individual? Shouldn't that be CC for a diploid genome?

ADD REPLYlink written 2.2 years ago by WouterDeCoster44k

yeah, C means CC, and if it is CA,we use N or Y to represent it

ADD REPLYlink written 2.2 years ago by taoyan10

why do you want to do this ? what is your final aim ?

ADD REPLYlink written 2.2 years ago by Pierre Lindenbaum129k

Thank you for your reply!Now I get a GWAS result , and I want to analysis some genes if there are any snps located on them.Then I need to know whether these snps will affect the functions of genes!

ADD REPLYlink written 2.2 years ago by taoyan10
1

Hello taoyan,

but than I think it is not a good idea to convert the genotypes in that way. A lot of programs work with the format 0/0 etc. as it is much faster to catch whether you have a reference or in alternative allele.

So please check first what the expected input for the programs you like to use is.

fin swimmer

ADD REPLYlink written 2.2 years ago by finswimmer13k
2
gravatar for JC
2.2 years ago by
JC11k
Mexico
JC11k wrote:

I'm assuming that you want to convert each VCF record into the genetic variant observed, so "0/0" is considered "C" and "1/1" is "A" and "0/1" is "CA".

#!/usr/bin/perl

use strict;
use warnings;

while (<>) {
  if (/^#/) {
    print;
    next;
  }
  chomp;
  my @a = split (/\t/, $_);
  my $ref = $a[3];
  my $alt = $a[4];
  for (my $i=9; $i <= $#a; $i++) {
    if ($a[$i] eq "0/0") {
      $a[$i] = $ref;
    } elsif ($a[$i] eq "1/1") {
      $a[$i] = $alt;
    } elsif ($a[$i] eq "0/1") {
      $a[$i] = "$ref$alt";
    } else {
      $a[$i] = "-";
    }
  }
  print join "\t", @a;
  print "\n";
}

this produces:

$ perl convertVariant.pl < file.vcf
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  R4157   R4158   R4163
chr7    30902031        .       C       A       .       .       PR      GT      C       C       C
ADD COMMENTlink written 2.2 years ago by JC11k

It works!!!Thank you!!!

ADD REPLYlink written 2.2 years ago by taoyan10

Hi, I am working on poplyploid crop and my VCF file have SNP data as 0/2, 1/2, 0/3, 2/3 etc in addition to standard 0/0, 1/1 and 0/1 SNP calling. Will this script work for my data? I tried to convert the VCF to hapmap using TASSEL which converted these codings to simple letters like R, S etc. depending upon SNPs, but its difficult to track down which allele has been marked to what letter. Thanks.

ADD REPLYlink modified 10 months ago • written 11 months ago by aman-coasab0

Unfortunately it will not work for you, it is considering only a diploid genome (2 alleles), if you see 0/1/2/3 means it is tetraploid. Please post as a new question with an example of your VCF, I (and others) could modify the script to use that type of genomes.

ADD REPLYlink written 10 months ago by JC11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1891 users visited in the last hour