Structural variant breakend
1
1
Entering edit mode
18 months ago
A_heath ▴ 120

Hi all,

I received structural variant calling results for the comparison of two bacterial genomes and I have multiple structural variants named "Breakends" and characterized as "SYMBOLIC" type. I've tried to search litterature article about it but it seems that there is not a proper definition.

Could I have some help to understand what a breakend is actually?

structural variant breakend • 1.5k views
4
Entering edit mode
18 months ago
cmdcolin ★ 2.9k

The best place to help is probably to look at section 5.4 of the VCF specification https://samtools.github.io/hts-specs/VCFv4.3.pdf

This describes it in detail, and it is pretty accessible to read. That said, breakends can be difficult to properly interpret. It can require putting together multiple records (e.g. multiple lines) of the VCF file to understand what is going on.

For example in VCF, you could use the syntax <INV> to represent a inversion (this example is from the spec above)

#CHROM POS ID REF ALT QUAL FILTER INFO
2 321682 INV0 T <INV> 6 PASS SVTYPE=INV;END=421681


or that same inversion could be described the breakends:

#CHROM POS ID REF ALT QUAL FILTER INFO
2 321681 bnd W G G]2 : 421681] 6 PASS SVTYPE=BND;MATEID=bnd U;EVENT=INV0
2 321682 bnd V T [2 : 421682[T 6 PASS SVTYPE=BND;MATEID=bnd X;EVENT=INV0
2 421681 bnd U A A]2 : 321681] 6 PASS SVTYPE=BND;MATEID=bnd W;EVENT=INV0
2 421682 bnd X C [2 : 321682[C 6 PASS SVTYPE=BND;MATEID=bnd V;EVENT=INV0


You might ask, why would you represent it using breakends if you could have the more simple <INV> representation? One reason might be that it might not just be a simple INV, where two points in the genome are flipped around...it may be more complex and lead to something like an "INVDUP"

See https://kero.hgc.jp/examples/CLCL/hg38/index.html for a good example and visualization of an INVDUP

That above thing would likely be represented as a series of breakends. But you could also ask, well, why not just say that INVDUP example like this?

2 321682 INV0 T <INVDUP> 6 PASS SVTYPE=INVDUP;END=421681


The reason might be that breakends can express more precision than that, giving the precise breakpoints, at the cost of being a bit harder to interpret. We have been working on jbrowse 2 to try to make tools for people to interpret breakends, but it is still challenging.

2
Entering edit mode

Also just as a footnote: it may be helpful to describe exactly what the breakends look like, people may be able to help interpret better if you provide them here

0
Entering edit mode

Well, thank you very much cmdcolin for taking the time to properly explain breakends! I am really thankful for that. I will take a look into the resources that you provided.

Based on a more concrete example, I have results like the one below to interpret:

CONTIG TYPE POS EVENTLENGTH REF ALT

Chromosome_contig SYMBOLIC 2256635 -1 G G]plasmid_contig:17079]

plasmid_contig SYMBOLIC 17079 -1 A A]Chromosome_contig:2256635]

Does that mean that an inversion occcured with the chromosome and plasmid contigs? What bugs me is that the event lenght is -1 bp and the genomic coordinates associated do not match with that event length...

Many thanks, Audrey

0
Entering edit mode

I gave the example of a inversion in my answer, but the data you are showing is likely more of a translocation, or perhaps, an integration of a plasmid with a chromosome. Try reading the vcf pdf linked and you will see many examples! It appears your data might not be in true VCF format (EVENTLENGTH is not a VCF field) but if your data uses breakends like VCF does, then you will find a lot of info there