Question: CNV calls in VCF format, conversion to PCAWG-11 Calibration
gravatar for mp85
2.8 years ago by
mp8510 wrote:

Hello, I have a VCF file where copy number variations are listed in this format:

##INFO=<ID=END,Number=1,Type=Integer,Description="End position of this structural variant">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##ALT=<ID=CNV,Description="Copy number variable region">
##FORMAT=<ID=TCN,Number=1,Type=Integer,Description="Total copy number">
##FORMAT=<ID=MCN,Number=1,Type=Integer,Description="Minor allele copy number">
1   564620  .   A   <CNV>   .   .   SVTYPE=CNV;END=232864203    GT:TCN:MCN  ./.:2:1 ./.:2:1
1   232864349   .   G   <CNV>   .   .   SVTYPE=CNV;END=232917630    GT:TCN:MCN  ./.:2:1 ./.:3:1
1   232917822   .   A   <CNV>   .   .   SVTYPE=CNV;END=249198692    GT:TCN:MCN  ./.:2:1 ./.:2:1

(I included only relevant fields)

I need some sort of porting from this format to the PCAWG-11 Calibration format, which is expressed like:

chromosome  start   end copy_number minor_cn    major_cn    cellular_prevalence
1   640305  239120876   2   1   1   0.94
2   59261869    91121847    0   0   0   0.88

I was thinking about writing a converter myself, but I seem to be missing some information (I have little to no bioinformatics experience). In particular:

  • where do I find the start value in the VCF file? Is it the pos column?
  • where do I find the major_cn value in the VCF file? From what I see, only the minor_cn information is obtainable
  • how can I calculate the cellular_prevalence field? If I'm right, one should be able to calculate it somehow

Also, it would be great if you can (possibly) point me to some converter already there to spare me the pain of coding it from scratch, I tried to google for converters a bit but didn't find anything useful.

Thank you for your replies.

sequencing snp next-gen genome • 1.7k views
ADD COMMENTlink modified 2.7 years ago by markus.riester490 • written 2.8 years ago by mp8510
gravatar for markus.riester
2.7 years ago by
markus.riester490 wrote:
  • Yes, "pos" should be start.
  • "major_cn" is total copy number - minor_cn (major+minor=total, minor <= major)
  • cellular_prevelance is the fraction of tumor cells with this alteration. Looks like your tool does not report this value. You can set to 1.

There are a gazillion VCF parsers and libraries, for example or Try using the search function here to find more.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by markus.riester490
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1296 users visited in the last hour