Question: CNVkit - Format of VCF file
0
gravatar for fongchunchan
3.2 years ago by
fongchunchan10
BCCRC
fongchunchan10 wrote:

I am at the step of deriving absolute integer copy number for each segment and the documentation states that one can pass in a vcf file of SNPs in the tumour samples:

cnvkit.py call Sample.cns -y -v Sample.vcf -o Sample.call.cns

This should extract b-allele frequencies and allow for the calculation of major and minor copy number. I am having trouble finding the exact format of VCF cnvkit needs in order for this work.

I've called SNPs using bcftools. Specifically, in the tumor and normal separately and then intersecting positions found in both (using bcftools isec). Then passed the vcf file of this format:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SampleA
1       926351  .       C       T       11.1    .       DP=2;VDB=0.0106;AF1=1;AC1=2;DP4=0,0,1,1;MQ=60;FQ=-33    GT:PL:GQ        1/1:42,6,0:9
1       1474167 .       A       G       8.65    .       DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=60;FQ=-30       GT:PL:GQ        1/1:38,3,0:5

When running I get an error like this:

Skipping 1:926351 C; unsure how to get alternative allele count: CallData(GT=1/1, PL=[42, 6, 0], GQ=9)
Skipping 1:1474167 A; unsure how to get alternative allele count: CallData(GT=1/1, PL=[38, 3, 0], GQ=5)

Seems like it doesn't know how to extract the relevant pieces of information from the VCF file. Does cnvkit accept vcf output from a separate SNP calling tool?

Thanks,

cnvkit • 1.5k views
ADD COMMENTlink modified 3.2 years ago by Eric T.2.5k • written 3.2 years ago by fongchunchan10
1
gravatar for Eric T.
3.2 years ago by
Eric T.2.5k
San Francisco, CA
Eric T.2.5k wrote:

Here's the relevant code in CNVkit. The parser checks for FORMAT fields "AD" (the most commonly seen one), "CLCAD2" (a vendor-specific code), or "AO" (I don't remember which caller emits this).

The problem CNVkit has with your VCF file is that the sample-specific data is not stored in the sample-specific columns. Is the alt allele count stored in the INFO column instead, e.g. the "AC1" field? Did bcftools put it there? If this is a standard thing that other users are likely to have then I can add a check in CNVkit to extract this field if it's there. Otherwise, could you try copying the relevant INFO field into the sample column using the "AD" or "AO" field?

ADD COMMENTlink written 3.2 years ago by Eric T.2.5k

Thanks for the reply.

Based on the VCF header produced by the bcftools:

INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">

So it would appear that the allele information is in the DP4 field and it is comma separated. This is direct output of bcftools. I'll try to use another germline mutation caller that outputs the allele data into AD.

ADD REPLYlink modified 3.2 years ago • written 3.2 years ago by fongchunchan10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1491 users visited in the last hour