3.8 years ago by
SVs are problematic for many pipelines/software as, unlike SNVs and small indels, each event involves at least two genomic loci.
Be aware that not all callers correctly classify events. Many callers will classify events purely on their break-end position and orientation. This results in deletion calls even when there is no copy number change to support the event (most callers), or an inversion calls even when only one of the two inversion breakpoints actually exist (e.g. DELLY). For simple germline analysis this is probably ok, and you can just ignore all large or inter-chromosomal events but for highly rearranged genomes (eg cancer), things are much more complicated.
thought that we could get all these annotations with a package that is already available
What you're asking is really two separate processes: one for looking at the intervening sequence of simple events, and another for break-end overlap for fusions/interchromosomal/complex events.
If you're familiar with BioConductor then you can do the first part relatively easily for a BEDPE: just convert to GRanges intervals and calculate overlaps against the BioConductor annotation package for your organism.
For the second part you might be interested in my StructuralVariantAnnotation package. It's key feature is conversion of VCFs generated by a number of popular SV callers into a GRanges object containing break-end coordinates. Once in GRanges format, you can again use the BioConductor annotation packages to calculate feature overlap.