Format of VCF for CNVkit call fonction and results
2
0
Entering edit mode
5.0 years ago
Hällyss ▴ 80

Hello ,

I'm trying to use a call function option that is --vcf. I have many questions:

  • what should be the format of the VCF. Indeed I have generated several VCFs from different tools and some do not. I can see what is missing (especially the column FORMAT) but I would like to reformat the file as it should in a single time.

  • when the control focntionne I get results like these:

    Chromosome start end gene log2 baf cn cn1 cn2 depth probes weight

    chr17 500 22157351 TP53_ex8-7,TP53_ex6,TP53_ex5-4b,TP53_ex3-2 0.0624461 0.316919 2 1 1 470.8 55 22.1944 chr17 25268558 41219812 ERBB2_ex10-11,ERBB2_ex12,ERBB2_ex13-14 7 0.00222809 2 841.005 128 43.7848 chr17 41222646 41267900 BRCA1_ex7,BRCA1_ex6,BRCA1_ex5 -2.35942 0 0 0 207.013 111 36.6061

I would like to know what the columns cn, cn1 and cn2 correspond to. And why the columns baf, cn1 and cn2 are not necessarily completed ?

Thank your for your answer,

Alice

vcf cnv format cnvkit • 1.9k views
ADD COMMENT
0
Entering edit mode
5.0 years ago
Eric T. ★ 2.7k

There is an example VCF in the test/formats/ directory of the CNVkit source. Some more guidance is in the online docs.

The FORMAT column is not used much, but try to ensure the INFO column has DP values and the FORMAT/sample columns have GT and AD or AO. Tumor/normal pairing will be detected automatically if you have a PEDIGREE tag, otherwise use the -i and -n options to specify the sample IDs.

To test interactively, you can use the API:

from cnvlib import tabio
v = tabio.read(filename, "vcf")
snps = v.heterozygous()

The call command can do a variety of things. See the allele frequencies section for what happens when a VCF is given. The baf column should be output when a VCF is given, and also cn1 and cn2 unless --method none is specified.

If you're seeing bugs, they might be fixed in the development version of CNVkit on GitHub.

ADD COMMENT
0
Entering edit mode
5.0 years ago
Hällyss ▴ 80

Hello ,

I looked for a way to get a VCF (from lofreq) to the proper input format for CNVkit but to no avail.

I have seen in the code that the frequency of the alleles are calculated from the VCF data. Now in my VCF I already have the AF calculated. Would not it be simpler to recover this value directly?

Otherwise I pooste here an extract from my VCF as well as the error console.


CHROM POS ID REF ALT QUAL FILTER INFO

chr2 29446202 . G A . PASS DP=1050;AF=0.538095;SB=0;DP4=245,236,290,275;CONSVAR

chr2 212569983 . G A . PASS DP=312;AF=1.000000;SB=0;DP4=0,0,184,128;CONSVAR


Traceback (most recent call last): File "/usr/local/bin/cnvkit.py", line 5, in <module> pkg_resources.run_script('CNVkit==0.8.3.dev0', 'cnvkit.py') File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 739, in run_script self.require(requires)[0].run_script(script_name, ns) File "/usr/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1501, in run_script exec(script_code, namespace, namespace) File "/usr/local/lib/python2.7/site-packages/CNVkit-0.8.3.dev0-py2.7.egg/EGG-INFO/scripts/cnvkit.py", line 13, in <module>

File "build/bdist.linux-x86_64/egg/cnvlib/commands.py", line 848, in _cmd_call

File "build/bdist.linux-x86_64/egg/cnvlib/commands.py", line 869, in do_call

File "build/bdist.linux-x86_64/egg/cnvlib/vary.py", line 53, in baf_by_ranges File "build/bdist.linux-x86_64/egg/cnvlib/genome/gary.py", line 398, in into_ranges File "build/bdist.linux-x86_64/egg/cnvlib/genome/intersect.py", line 69, in into_ranges File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 1997, in __getitem__ return self._getitem_column(key) File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2004, in _getitem_column return self._get_item_cache(key) File "/usr/local/lib/python2.7/site-packages/pandas/core/generic.py", line 1350, in _get_item_cache values = self._data.get(item) File "/usr/local/lib/python2.7/site-packages/pandas/core/internals.py", line 3290, in get loc = self.items.get_loc(item) File "/usr/local/lib/python2.7/site-packages/pandas/indexes/base.py", line 1947, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:4154) File "pandas/index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas/index.c:4018) File "pandas/hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12368) File "pandas/hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12322) KeyError: 'alt_freq'

Thank you

ADD COMMENT
0
Entering edit mode

Please use ADD COMMENT to answer to earlier posts as such this thread remains logically structured and easy to follow.

ADD REPLY
0
Entering edit mode

Is there a FORMAT column and at least one sample column in your VCF? If so, which keys are available there?

ADD REPLY

Login before adding your answer.

Traffic: 2494 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6