CNV calls in VCF format, conversion to PCAWG-11 Calibration
1
0
Entering edit mode
6.7 years ago
mp85 ▴ 10

Hello, I have a VCF file where copy number variations are listed in this format:

##INFO=<ID=END,Number=1,Type=Integer,Description="End position of this structural variant">
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##ALT=<ID=CNV,Description="Copy number variable region">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=TCN,Number=1,Type=Integer,Description="Total copy number">
##FORMAT=<ID=MCN,Number=1,Type=Integer,Description="Minor allele copy number">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOUR 
1   564620  .   A   <CNV>   .   .   SVTYPE=CNV;END=232864203    GT:TCN:MCN  ./.:2:1 ./.:2:1
1   232864349   .   G   <CNV>   .   .   SVTYPE=CNV;END=232917630    GT:TCN:MCN  ./.:2:1 ./.:3:1
1   232917822   .   A   <CNV>   .   .   SVTYPE=CNV;END=249198692    GT:TCN:MCN  ./.:2:1 ./.:2:1

(I included only relevant fields)

I need some sort of porting from this format to the PCAWG-11 Calibration format, which is expressed like:

chromosome  start   end copy_number minor_cn    major_cn    cellular_prevalence
1   640305  239120876   2   1   1   0.94
2   59261869    91121847    0   0   0   0.88

I was thinking about writing a converter myself, but I seem to be missing some information (I have little to no bioinformatics experience). In particular:

  • where do I find the start value in the VCF file? Is it the pos column?
  • where do I find the major_cn value in the VCF file? From what I see, only the minor_cn information is obtainable
  • how can I calculate the cellular_prevalence field? If I'm right, one should be able to calculate it somehow

Also, it would be great if you can (possibly) point me to some converter already there to spare me the pain of coding it from scratch, I tried to google for converters a bit but didn't find anything useful.

Thank you for your replies.

genome SNP next-gen sequencing • 2.8k views
ADD COMMENT
2
Entering edit mode
6.6 years ago
  • Yes, "pos" should be start.
  • "major_cn" is total copy number - minor_cn (major+minor=total, minor <= major)
  • cellular_prevelance is the fraction of tumor cells with this alteration. Looks like your tool does not report this value. You can set to 1.

There are a gazillion VCF parsers and libraries, for example https://github.com/vcflib/vcflib or http://bioconductor.org/packages/stats/bioc/VariantAnnotation. Try using the search function here to find more.

ADD COMMENT

Login before adding your answer.

Traffic: 2594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6