SV with identical start and end positions
1
0
Entering edit mode
20 months ago
Pac314 ▴ 10

There are many insertions with identical start and end positions in an SV VCF produced by Manta. Are these inversions? I am not sure what these variant entries represent and many of them have long SV lengths reported.

variants strucural VCF • 929 views
ADD COMMENT
0
Entering edit mode

Please include an example of such an SV. There are many possible reasons. IIRC for manta they're likely to be either insertions or calls in breakpoint notation.

ADD REPLY
0
Entering edit mode

Thanks for your reply.

Ok so here is an example with identical start and end positions without an SV length:

chr1    1020061 MantaINS:7:1415:1415:0:4:0  C   <INS>   74  PASS    END=1020061;SVTYPE=INS;CIPOS=0,8;CIEND=0,8;HOMLEN=8;HOMSEQ=CCCCCCCC;LEFT_SVINSSEQ=CCCCCCCCCCCCCCCCCCCCCCCCGCCCCCCCCCCCCGGG;RIGHT_SVINSSEQ=GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGTTGGCTGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGAGGAGAGAGGGGGAGGGGCGCCGCCCTGGCCCCG

and another with an SV length provided:

chr1    1068824 MantaINS:93477:0:0:0:0:0    G   GGCCACGCGGGCTGTGCAGATGCAGGTGCGGCGGGGCGGGGCCACGCGGGCTGTGAAGGTGCAGGTGCGGCGGGGCAGA 999 PASS    END=1068824;SVTYPE=INS;SVLEN=78;CIGAR=1M78I;CIPOS=0,10;HOMLEN=10;HOMSEQ=GCCACGCGGG  
ADD REPLY
0
Entering edit mode

The second example is a 'normal' insertion which manta provides the full, exact sequence of the insertion in the ALT column.

The first example is of an insertion longer than the library fragment size. Manta uses it's own custom LEFT_SVINSSEQ and RIGHT_SVINSSEQ fields that give you sequences at the start and the end of the inserted sequence but it's too long for manta to assemble so it can't report the full sequence nor does it report an SVLEN (since it only know the length is longer than what it can assemble).

ADD REPLY
0
Entering edit mode
20 months ago
d-cameron ★ 2.9k

Are these inversions?

No, they are insertions. END is defined in version 4.3 of the VCF specifications as:

End reference position (1-based), indicating the variant spans positions POS–END on reference/contig CHROM. Normally this is the position of the last base in the REF allele, so it can be derived from POS and the length of REF, and no END INFO field is needed. However when symbolic alleles are used, e.g. in gVCF or structural variants, an explicit END INFO field provides variant span information that is otherwise unknown.

A clean insertion is expected to have an identical POS and END. This is as expected.

I am not sure what these variant entries represent and many of them have long SV lengths reported.

For insertions, SVLEN is the number of inserted bases.

ADD COMMENT

Login before adding your answer.

Traffic: 2788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6