Question: How to convert vcf format with 0/0 0/1 1/1 to SNP
1
gravatar for taoyan
6 months ago by
taoyan10
ZheJiang University, Hangzhou, China
taoyan10 wrote:

Hello I have a vcf file like this:

#CHROM     POS     ID   REF ALT QUAL FILTER INFO    FORMAT  R4157   R4158   R4163
chr7    30902031    .    C   A   .      .    PR       GT     0/0     0/0     0/0

now I want to convert it to the format like that:

#CHROM    POS      ID   REF ALT QUAL    FILTER  INFO    FORMAT  R4157   R4158   R4163
chrC07  30902031    .    C   A   .         .     PR       GT     C        C       C

So could anyone know to to make it ?Thanks !!!

snp sequence • 339 views
ADD COMMENTlink modified 6 months ago by JC7.0k • written 6 months ago by taoyan10
1

What does C mean for an individual? Shouldn't that be CC for a diploid genome?

ADD REPLYlink written 6 months ago by WouterDeCoster35k

yeah, C means CC, and if it is CA,we use N or Y to represent it

ADD REPLYlink written 6 months ago by taoyan10

why do you want to do this ? what is your final aim ?

ADD REPLYlink written 6 months ago by Pierre Lindenbaum115k

Thank you for your reply!Now I get a GWAS result , and I want to analysis some genes if there are any snps located on them.Then I need to know whether these snps will affect the functions of genes!

ADD REPLYlink written 6 months ago by taoyan10
1

Hello taoyan,

but than I think it is not a good idea to convert the genotypes in that way. A lot of programs work with the format 0/0 etc. as it is much faster to catch whether you have a reference or in alternative allele.

So please check first what the expected input for the programs you like to use is.

fin swimmer

ADD REPLYlink written 6 months ago by finswimmer7.9k
1
gravatar for JC
6 months ago by
JC7.0k
Mexico
JC7.0k wrote:

I'm assuming that you want to convert each VCF record into the genetic variant observed, so "0/0" is considered "C" and "1/1" is "A" and "0/1" is "CA".

#!/usr/bin/perl

use strict;
use warnings;

while (<>) {
  if (/^#/) {
    print;
    next;
  }
  chomp;
  my @a = split (/\t/, $_);
  my $ref = $a[3];
  my $alt = $a[4];
  for (my $i=9; $i <= $#a; $i++) {
    if ($a[$i] eq "0/0") {
      $a[$i] = $ref;
    } elsif ($a[$i] eq "1/1") {
      $a[$i] = $alt;
    } elsif ($a[$i] eq "0/1") {
      $a[$i] = "$ref$alt";
    } else {
      $a[$i] = "-";
    }
  }
  print join "\t", @a;
  print "\n";
}

this produces:

$ perl convertVariant.pl < file.vcf
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  R4157   R4158   R4163
chr7    30902031        .       C       A       .       .       PR      GT      C       C       C
ADD COMMENTlink written 6 months ago by JC7.0k

It works!!!Thank you!!!

ADD REPLYlink written 6 months ago by taoyan10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1584 users visited in the last hour