annotation of SV (Structural Variants)
2
3
Entering edit mode
5.7 years ago
Bogdan ★ 1.3k

Dear all,

for a list of Structural Variants (including deletions, duplications, inversions, translocations), either in VCF or BEDPE format, we would like to have the gene annotations, and the lists of the following sets of genes :

-- fusions (if both breakpoints are in exons, introns, utrs) -- truncations (if only one breakpoint is in exon, intron, utr; and the other breakpoint is in intergenic area) -- the genes in the areas that are deleted, duplicated, inverted

Although I wrote some scripts in perl based on Annovar , thought that we could get all these annotations with a package that is already available ?

thanks a lot,

-- bogdan

vcf annotation structural variants • 4.7k views
0
Entering edit mode

Dear Daniel, these are very good suggestions, thank you ! 'm planning to use StructuralVariantAnnotation and compare the results with those derived from my Perl scripts.

Our work is primarily related to SOMATIC SV (in pediatric cancers), and thought that I can ask you please : any recommendations regarding the SV callers to use ? i've started with DELLY, LUMPY, and MANTA and now I cam comparing the results.

also, 've read your paper and work on GRIDSS, it looks great ;) although it seems that the focus has been more on germline calls ;)

2
Entering edit mode

The GRIDSS paper focused on germ-line results, but most of our applications have been in cancer genomics and GRIDSS did manage to win the ICGC-TCGA DREAM Somatic Mutation Calling Challenge (SV sub-challenge #5).

See https://github.com/PapenfussLab/gridss/blob/master/example/somatic.sh for very basic tumour/normal somatic variant calling using GRIDSS.

0
Entering edit mode

thanks, Daniel, i could run GRIDSS as soon as our new PBS cluster is completely configured.

also please may I ask, what filtering criteria would you recommend for SV ? particularly AF, or number of SR and PR.

and, if you do not mind me asking, after Somatic Mutation Challenge, beside DELLY, MANTA and GRIDSS, which other algorithms did reasonably well ?

1
Entering edit mode

Somatic calling Leaderboard results are publicly available at https://www.synapse.org/#!Synapse:syn312572/wiki/61509

0
Entering edit mode

Dear Daniel, thank you for the information on SV calling. Considering your experience with all SV callers, and the nice ROC curves from your publication, may I ask please :

-- probably using 2-3 SV callers may offer less False Negatives than using only 1 SV caller. And if it is so, beside GRIDSS, which other Sv caller(s) would you recommend ?

thanks a lot for sharing your experience with us !

1
Entering edit mode

Please do not add answers unless you're answering the top level question. If you're replying to someone, use the Add Comment or Add Reply options. I'm moving your "answer"s to comments now.

0
Entering edit mode

thanks, Ram ;) a pretty exciting conversation, I shall say ;)

0
Entering edit mode

0
Entering edit mode

ok ;) thank you, Ram ;)

0
Entering edit mode

How did you do with this? I'm developing a pipeline that works well for me if you're still looking for help

6
Entering edit mode
5.7 years ago
d-cameron ★ 2.8k

SVs are problematic for many pipelines/software as, unlike SNVs and small indels, each event involves at least two genomic loci.

Be aware that not all callers correctly classify events. Many callers will classify events purely on their break-end position and orientation. This results in deletion calls even when there is no copy number change to support the event (most callers), or an inversion calls even when only one of the two inversion breakpoints actually exist (e.g. DELLY). For simple germline analysis this is probably ok, and you can just ignore all large or inter-chromosomal events but for highly rearranged genomes (eg cancer), things are much more complicated.

thought that we could get all these annotations with a package that is already available

What you're asking is really two separate processes: one for looking at the intervening sequence of simple events, and another for break-end overlap for fusions/interchromosomal/complex events.

If you're familiar with BioConductor then you can do the first part relatively easily for a BEDPE: just convert to GRanges intervals and calculate overlaps against the BioConductor annotation package for your organism.

For the second part you might be interested in my StructuralVariantAnnotation package. It's key feature is conversion of VCFs generated by a number of popular SV callers into a GRanges object containing break-end coordinates. Once in GRanges format, you can again use the BioConductor annotation packages to calculate feature overlap.

1
Entering edit mode
4.4 years ago
LGMgeo ▴ 100

I suggest using AnnotSV for SV annotation (annotation with gene names and locations, OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also with your own in-house information).

AnnotSV constructs an annotation based on the full-length SV but also an annotation for each gene within the SV. You will so have access to :

• all the overlapped genes information (ID, OMIM...)

• the SV location within each overlapped gene (e.g. "exon3-intron11", "txStart-intron19", ...). You could so determine fusion or truncation events.

Input format: VCF or BED

You can look at this post describing the annotSV tool: Annotation for SV and CNV