Question: CNV calls in VCF format, conversion to PCAWG-11 Calibration
0
gravatar for mp85
7 weeks ago by
mp8510
mp8510 wrote:

Hello, I have a VCF file where copy number variations are listed in this format:

##INFO=<ID=END,Number=1,Type=Integer,Description="End position of this structural variant">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##ALT=<ID=CNV,Description="Copy number variable region">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=TCN,Number=1,Type=Integer,Description="Total copy number">
##FORMAT=<ID=MCN,Number=1,Type=Integer,Description="Minor allele copy number">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOUR 
1   564620  .   A   <CNV>   .   .   SVTYPE=CNV;END=232864203    GT:TCN:MCN  ./.:2:1 ./.:2:1
1   232864349   .   G   <CNV>   .   .   SVTYPE=CNV;END=232917630    GT:TCN:MCN  ./.:2:1 ./.:3:1
1   232917822   .   A   <CNV>   .   .   SVTYPE=CNV;END=249198692    GT:TCN:MCN  ./.:2:1 ./.:2:1

(I included only relevant fields)

I need some sort of porting from this format to the PCAWG-11 Calibration format, which is expressed like:

chromosome  start   end copy_number minor_cn    major_cn    cellular_prevalence
1   640305  239120876   2   1   1   0.94
2   59261869    91121847    0   0   0   0.88

I was thinking about writing a converter myself, but I seem to be missing some information (I have little to no bioinformatics experience). In particular:

  • where do I find the start value in the VCF file? Is it the pos column?
  • where do I find the major_cn value in the VCF file? From what I see, only the minor_cn information is obtainable
  • how can I calculate the cellular_prevalence field? If I'm right, one should be able to calculate it somehow

Also, it would be great if you can (possibly) point me to some converter already there to spare me the pain of coding it from scratch, I tried to google for converters a bit but didn't find anything useful.

Thank you for your replies.

sequencing snp next-gen genome • 172 views
ADD COMMENTlink modified 6 weeks ago by markus.riester210 • written 7 weeks ago by mp8510
2
gravatar for markus.riester
6 weeks ago by
markus.riester210 wrote:
  • Yes, "pos" should be start.
  • "major_cn" is total copy number - minor_cn (major+minor=total, minor <= major)
  • cellular_prevelance is the fraction of tumor cells with this alteration. Looks like your tool does not report this value. You can set to 1.

There are a gazillion VCF parsers and libraries, for example https://github.com/vcflib/vcflib or http://bioconductor.org/packages/stats/bioc/VariantAnnotation. Try using the search function here to find more.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by markus.riester210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1406 users visited in the last hour