Question: Interpreting Gaps at Pos 0 in Terms of VCF
0
gravatar for pld
5.2 years ago by
pld4.8k
United States
pld4.8k wrote:

I'm writing a python script to convert clustal formatted alignments into VCF files. I'm lost on one thing, how to interpret a gap at the start of an alignment:

 

ENG1-REF-K      ATTTAAGTGAATAGCTTGGCTATCTCACTTCCCCTCGTTCTCTTGCAGAACTTTGATTTT
MERS_EMC_V      ---------------------------------------------CAGAACTTTGATTTT
                                                             ***************

Based on the VCF format, it seems to assume that there is a base upstream of the deletion. E.g. if I have ACGT and A-GT, the VCF file should be REF: AC, ALT: A. The position of the deletion is 2, but the position of the ALT is 1 according to VCF.

http://samtools.github.io/hts-specs/VCFv4.2.pdf

How are terminal deletions considered in VCF?

msa alignment clustal vcf • 1.3k views
ADD COMMENTlink modified 5.2 years ago by Zhaorong1.2k • written 5.2 years ago by pld4.8k
4
gravatar for Zhaorong
5.2 years ago by
Zhaorong1.2k
State College, PA
Zhaorong1.2k wrote:

From the VCF (Variant Call Format) version 4.1 specification (and also the 4.2):

"the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event".

So in your example:

POS = 1

REF = ATTTAAGTGAATAGCTTGGCTATCTCACTTCCCCTCGTTCTCTTGC

ALT = C

 

 

ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Zhaorong1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1842 users visited in the last hour