How to convert vcf format with 0/0 0/1 1/1 to SNP
1
1
Entering edit mode
5.9 years ago
taoyan ▴ 10

Hello I have a vcf file like this:

#CHROM     POS     ID   REF ALT QUAL FILTER INFO    FORMAT  R4157   R4158   R4163
chr7    30902031    .    C   A   .      .    PR       GT     0/0     0/0     0/0

now I want to convert it to the format like that:

#CHROM    POS      ID   REF ALT QUAL    FILTER  INFO    FORMAT  R4157   R4158   R4163
chrC07  30902031    .    C   A   .         .     PR       GT     C        C       C

So could anyone know to to make it ?Thanks !!!

sequence snp • 4.9k views
ADD COMMENT
1
Entering edit mode

What does C mean for an individual? Shouldn't that be CC for a diploid genome?

ADD REPLY
0
Entering edit mode

yeah, C means CC, and if it is CA,we use N or Y to represent it

ADD REPLY
0
Entering edit mode

why do you want to do this ? what is your final aim ?

ADD REPLY
0
Entering edit mode

Thank you for your reply!Now I get a GWAS result , and I want to analysis some genes if there are any snps located on them.Then I need to know whether these snps will affect the functions of genes!

ADD REPLY
1
Entering edit mode

Hello taoyan,

but than I think it is not a good idea to convert the genotypes in that way. A lot of programs work with the format 0/0 etc. as it is much faster to catch whether you have a reference or in alternative allele.

So please check first what the expected input for the programs you like to use is.

fin swimmer

ADD REPLY
2
Entering edit mode
5.9 years ago
JC 13k

I'm assuming that you want to convert each VCF record into the genetic variant observed, so "0/0" is considered "C" and "1/1" is "A" and "0/1" is "CA".

#!/usr/bin/perl

use strict;
use warnings;

while (<>) {
  if (/^#/) {
    print;
    next;
  }
  chomp;
  my @a = split (/\t/, $_);
  my $ref = $a[3];
  my $alt = $a[4];
  for (my $i=9; $i <= $#a; $i++) {
    if ($a[$i] eq "0/0") {
      $a[$i] = $ref;
    } elsif ($a[$i] eq "1/1") {
      $a[$i] = $alt;
    } elsif ($a[$i] eq "0/1") {
      $a[$i] = "$ref$alt";
    } else {
      $a[$i] = "-";
    }
  }
  print join "\t", @a;
  print "\n";
}

this produces:

$ perl convertVariant.pl < file.vcf
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  R4157   R4158   R4163
chr7    30902031        .       C       A       .       .       PR      GT      C       C       C
ADD COMMENT
0
Entering edit mode

It works!!!Thank you!!!

ADD REPLY
0
Entering edit mode

Hi, I am working on poplyploid crop and my VCF file have SNP data as 0/2, 1/2, 0/3, 2/3 etc in addition to standard 0/0, 1/1 and 0/1 SNP calling. Will this script work for my data? I tried to convert the VCF to hapmap using TASSEL which converted these codings to simple letters like R, S etc. depending upon SNPs, but its difficult to track down which allele has been marked to what letter. Thanks.

ADD REPLY
0
Entering edit mode

Unfortunately it will not work for you, it is considering only a diploid genome (2 alleles), if you see 0/1/2/3 means it is tetraploid. Please post as a new question with an example of your VCF, I (and others) could modify the script to use that type of genomes.

ADD REPLY

Login before adding your answer.

Traffic: 3133 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6