In The Vcf Format What Is The Correct Way To Represent An Insertion That Cannot Be Positioned (Missing Pos Column)
0
0
Entering edit mode
12.4 years ago

Hi all,

The VCF 4.1 file format specification states that the POS field is required. But suppose that you compare two genomes, a reference and an assembly, and find a big insertion in the assembly that you can't map to the reference unambiguously. Let me show an example. Let's denote by signed numbers large conservative regions (synteny blocks).

Reference is: +1 +2 +3 +2 +4 Assembly is: -1 +3 +4 -2 +5 -2

You see that +5 is a unique sequence that is not homologous to any sequence in the reference. But due to rearrangements, it's very hard to find the actual position of +5 in the reference. This situation is very common in bacteria, even within the same species (different strains). What is a proper way to report it in VCF?

P.S. VCF validator from vcftools doesn't permit '.' in the POS column.

vcf • 2.7k views
ADD COMMENT
2
Entering edit mode

I don't think VCF is designed for your use case.

ADD REPLY
1
Entering edit mode

Which solution would you suggest? A custom file format for such cases? Or it could be reasonable to extend VCF format for this?

ADD REPLY
3
Entering edit mode
ADD REPLY
0
Entering edit mode

Thank you, Jeremy, it is a very interesting link!

ADD REPLY

Login before adding your answer.

Traffic: 5326 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6