Losing my mind with a VCF problem
1
0
Entering edit mode
5 weeks ago
a.beggs ▴ 60

Hi all

I have a VCF file with the following lines:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE
chr17   23197000        Spectre.DEL.7ROFFYQK    N       LOSS    .       .       END=25683000;SVLEN=2486000;SVTYPE=LOSS;CN=0     GT:HO:GQ        1/1:0.0:60
chr18   19357000        Spectre.DEL.8B1N5YFJ    N       LOSS    .       .       END=20560000;SVLEN=1203000;SVTYPE=LOSS;CN=0     GT:HO:GQ        1/1:0.0:60
chr1_KI270709v1_random  2000    Spectre.DUP.Y9R4QQKP    N       GAIN    .       .       END=18000;SVLEN=16000;SVTYPE=GAIN;CN=42 GT:HO:GQ        ./.:0.0:60
chr2_KI270715v1_random  143000  Spectre.DUP.7IRZ6XDF    N       GAIN    .       .       END=160000;SVLEN=17000;SVTYPE=GAIN;CN=5 GT:HO:GQ        ./.:0.0:60
chr9_KI270719v1_random  137000  Spectre.DUP.YC1FK3L0    N       GAIN    .       .       END=173000;SVLEN=36000;SVTYPE=GAIN;CN=4 GT:HO:GQ        ./.:0.0:60
chr11_KI270721v1_random 5000    Spectre.DUP.YB0LB1EU    N       GAIN    .       .       END=18000;SVLEN=13000;SVTYPE=GAIN;CN=4  GT:HO:GQ        ./.:0.0:60

For various reasons the tertiary analysis pipeline I am feeding the VCF into is extremely fussy about its input. It wants:

  • SVTYPE has to be CNV
  • ALT allele needs to be <CNV>
  • the ID field is used to determine if it is LOSS or GAIN so needs to include this text
  • FORMAT/CN field is required for copy number

I have tried pyVCF, BCFtools and awk to convert it to look like this but can't seem to make it work... has anyone the VCF wizadary to give me any pointers please? The main issue is getting the CN from INFO to FORMAT, and adding the ID field to have LOSS/GAIN

vcf bcftools • 436 views
ADD COMMENT
0
Entering edit mode

The main issue is getting the CN from INFO to FORMAT:

Do you mean like the following example diff:

- chr18   19357000        Spectre.DEL.8B1N5YFJ    N   LOSS    .       .       END=25683000;SVLEN=2486000;SVTYPE=LOSS;CN=0     GT:HO:GQ        1/1:0.0:60

+ chr18   19357000        LOSS.Spectre.DEL.8B1N5YFJ    N   <CNV>    .       .      END=25683000;SVLEN=2486000;SVTYPE=CNV     GT:HO:GQ:CN        1/1:0.0:60:0

You also need to modify the header.

What kind of pipeline is this?

ADD REPLY
0
Entering edit mode

Yeah that's what I'm looking for, awk can take care of the header for me... are you saying diff can do this?!

ADD REPLY
1
Entering edit mode

That should be easy to script in perl. diff was just my way of defining the changes to be made to your file.

ADD REPLY
0
Entering edit mode

I have tried pyVCF, BCFtools and awk to convert it to look like this but can't seem to make it work.

so this is your real question; Show us the code.

ADD REPLY
2
Entering edit mode
5 weeks ago
Michael 54k

usage: perl vcffixxer.pl myfile.vcf > myfile-fixed.vcf

#!/usr/bin/env perl
use strict;
use warnings;

while (<>) {
  print;
  last if /^#CHROM/;
}
while (<>) {
  print && next if /^#/;
  chomp;
  my @l = split /\s*\t\s*/;
  my $type = $l[4];
  (print "$_\n" && next) unless $type eq 'LOSS' || $type eq 'GAIN';
  $l[4] = '<CNV>';
  $l[2] .= ".$type";
  my ($cn) = $l[7] =~/CN=(\d+)/; $l[7] =~s/;?CN=\d+//;
  $l[7] =~ s/SVTYPE=(LOSS|GAIN)/SVTYPE=CNV/;
  $l[8] .= ':CN'; $l[9] .= ":$cn";

  print (join ("\t", @l),"\n");
}

Output:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
chr17   23197000    Spectre.DEL.7ROFFYQK.LOSS   N   <CNV>   .   .   END=25683000;SVLEN=2486000;SVTYPE=CNV   GT:HO:GQ:CN 1/1:0.0:60:0
chr18   19357000    Spectre.DEL.8B1N5YFJ.LOSS   N   <CNV>   .   .   END=20560000;SVLEN=1203000;SVTYPE=CNV   GT:HO:GQ:CN 1/1:0.0:60:0
chr1_KI270709v1_random  2000    Spectre.DUP.Y9R4QQKP.GAIN   N   <CNV>   .   .   END=18000;SVLEN=16000;SVTYPE=CNV    GT:HO:GQ:CN ./.:0.0:60:42
chr2_KI270715v1_random  143000  Spectre.DUP.7IRZ6XDF.GAIN   N   <CNV>   .   .   END=160000;SVLEN=17000;SVTYPE=CNV   GT:HO:GQ:CN ./.:0.0:60:5
chr9_KI270719v1_random  137000  Spectre.DUP.YC1FK3L0.GAIN   N   <CNV>   .   .   END=173000;SVLEN=36000;SVTYPE=CNV   GT:HO:GQ:CN ./.:0.0:60:4
chr11_KI270721v1_random 5000    Spectre.DUP.YB0LB1EU.GAIN   N   <CNV>   .   .   END=18000;SVLEN=13000;SVTYPE=CNV    GT:HO:GQ:CN ./.:0.0:60:4
ADD COMMENT

Login before adding your answer.

Traffic: 1560 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6