Question: Valid or invalid vcf? REF nucleotide is not matching reference in vcf lines where ALT equals <CNV>
0
gravatar for dschika
2.7 years ago by
dschika300
European Union
dschika300 wrote:

Hi,

I got a vcf file and the first I checked was if the first nucleotide given at REF matches the nucleotide in the reference at position POS. This was true for lines with SNPs, indels, and breakends, but not for lines having "< CNV>" as ALT. In those < CNV>-lines the REF nucleotide always matches position (POS+1) in the reference.

In the VCF specification it is written that

If any of the ALT alleles is a symbolic allele (an angle-bracketed ID String “< ID>”) then the padding base is required and POS denotes the coordinate of the base preceding the polymorphism.

Does this mean that if there is a symbolic allele like < CNV>, the nucleotide given in REF is in fact at position (POS+1) and the vcf I received is valid?

Or should the nucleotide given at REF always match the nucleotide at position POS in the reference (meaning the vcf I received contains some invalid lines)?

Thanks

cnv vcf • 1.1k views
ADD COMMENTlink modified 2.7 years ago by Pierre Lindenbaum129k • written 2.7 years ago by dschika300

do you have a END attribute in the INFO column when there is a <CNV> ?

ADD REPLYlink written 2.7 years ago by Pierre Lindenbaum129k

Yes. Here is an example line:

chr1    11174372    .   A   <CNV>   100.0   PASS    FR=.;PRECISE=FALSE;SVTYPE=CNV;END=11217311;LEN=42939;NUMTILES=7;SD=0.21;REF_CN=2;CI=0.05:1.60885,0.95:2.43413;RAW_CN=1.99   GT:GQ:CN    ./.:0:1.99

Nucleotide at position 11174372 on chr1 is C. Nucleotide at position 11174373 on chr1 is A.

ADD REPLYlink written 2.7 years ago by dschika300
0
gravatar for Pierre Lindenbaum
2.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

Yes. Here is an example line:

see the VCF spec: https://samtools.github.io/hts-specs/VCFv4.2.pdf

##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">

For precise variants, END is POS + length of REF allele - 1, and the for imprecise variants the corresponding best estimate
ADD COMMENTlink written 2.7 years ago by Pierre Lindenbaum129k

Thanks, Pierre.

I am still not sure if I understood the manual: It menas, that in such CNV-lines POS equals REF allele - 1?

ADD REPLYlink written 2.7 years ago by dschika300
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1066 users visited in the last hour