Vcf strange representation of alt and ref
3
1
Entering edit mode
7.7 years ago
sacha ★ 2.4k

Hi,

in a vcf file, I have the following line :

chr1    16571   .       GCCAGAAATC      ACCAGAAATG

I would like to understand why this representation is used. It seems it's exaclty the same than two snp. Or I may be wrong ?

chr1    16571   .       G      A
chr1    16580   .       C      T
vcf variant • 2.0k views
ADD COMMENT
0
Entering edit mode

Yes, it may be as you wrote. I am not familiar with this kind of annotation in vcf file. I see that first time. To be curious: which tool are you using for creating vcf file?

ADD REPLY
0
Entering edit mode

I think it's a gvcf from gatk

ADD REPLY
1
Entering edit mode
7.7 years ago
William ★ 5.3k

You probably used a haplotype caller like Freebayes or GATK haplotype caller to call your variants.

Have a look at converting haplotypes to allelic primitives. https://software.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_VariantsToAllelicPrimitives.php https://github.com/vcflib/vcflib#vcfallelicprimitives

You might also find this interesting, normalization of variants calls.

http://genome.sph.umich.edu/wiki/Variant_Normalization

ADD COMMENT
1
Entering edit mode
7.7 years ago

By making it a single entry with both variants, it expresses that you have no reads like this

ACCAGAAATC

GCCAGAAATG

All the reads with a G at the first position that cross the second position have a C, all the ones with an A in the first position have a G in the second position.

ADD COMMENT
0
Entering edit mode
7.7 years ago
microfuge ★ 1.9k

Could you run the vcf through vcf-validator and see if it reports any errors ? Could be a complex substitution. From the vcf4.2 description -

"REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive). Multiple bases are permitted. The value in the POS field refers to the position of the first base in the String. For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event; this padding base is not required (although it is permitted) for e.g. complex substitutions or other events where all alleles have at least one base represented in their Strings. If any of the ALT alleles is a symbolic allele (an angle-bracketed ID String “<id>”) then the padding base is required and POS denotes the coordinate of the base preceding the polymorphism. Tools processing VCF files are not required to preserve case in the allele Strings. (String, Required). " Also check page 8 of the file available here https://samtools.github.io/hts-specs/VCFv4.2.pdf .

ADD COMMENT

Login before adding your answer.

Traffic: 2620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6