Question: annotation of SV (structural variants)
5
gravatar for Bogdan
14 months ago by
Bogdan720
Palo Alto, CA, USA
Bogdan720 wrote:

Dear all,

which tools would you recommend to use for the annotation of SV (structural variants) with gene information (SV including deletions, insertions, translocations, inversions, duplications). thank you !

-- bogdan

annotation sv • 2.0k views
ADD COMMENTlink modified 14 months ago by JJ430 • written 14 months ago by Bogdan720

You have these stored in VCF format?

ADD REPLYlink written 14 months ago by Kevin Blighe41k

yes, Kevin, the files are in VCF format.

ADD REPLYlink modified 14 months ago • written 14 months ago by Bogdan720

Can you try something like ANNOVAR? - start from the Quick Start-Up Guide. Installing everything can take a while.

Then, to annotate a VCF, you would do something like this:

perl annovar/convert2annovar.pl -format vcf4 -withzyg --allsample -outfile anotation.ann MySV.vcf ;

perl annovar/table_annovar.pl anotation.ann.avinput annovar/humandb/ -buildve hg19 -remove -otherinfo -onetranscript -protocol refGene,cytoBand,esp6500siv2_all,exac03,dbnsfp30a,avsnp147,cosmic70 -operation g,r,f,f,f,f,f -nastring "NA" -csvout ;

ANNOVAR Also allows you to annotate via the Database of Genomic Variants (DGV).

ADD REPLYlink written 14 months ago by Kevin Blighe41k

SV annotation (with OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also with your own in-house information) can be easily automated !

You can look at this post describing the annotSV tool: Annotation for SV and CNV

ADD REPLYlink modified 9 months ago • written 9 months ago by LGMgeo90
2
gravatar for JJ
14 months ago by
JJ430
JJ430 wrote:

I recommend snpeff. You can annotate SNPs and SVs together in one go :) And you get a nice report!

ADD COMMENTlink written 14 months ago by JJ430

Hi, how can you use Snpeff to annotate translocations for example?

ADD REPLYlink written 14 months ago by alons270
1

I have tried SnpEff with an output vcf file from DELLY 0.7.5, where the TRANSLOCATIONS are annotated in <tra> format, and not in <bnd> format, and the latest version of SnpEff did not wok on <tra> items.

ADD REPLYlink written 14 months ago by Bogdan720

I see, I'll try too, please update if you succeed. I will as well.

ADD REPLYlink written 14 months ago by alons270

You should update to Delly to 0.7.7 which uses the BND format - then it will work

ADD REPLYlink modified 14 months ago • written 14 months ago by JJ430
0
gravatar for d-cameron
14 months ago by
d-cameron2.0k
Australia
d-cameron2.0k wrote:

This very much depends on what sort of annotation you want to do.

If you're interested in breakend overlap annotation, then I recommend using my StructuralVariantAnnotation package to convert to a paired breakend GRanges object then using the standard bioconductor packages to annotate stuff like repeat overlaps, gene position and orientation (ie is this a viable fusion gene), and so on.

If you're wanting to do annotations of the functional impact of the variant, then this is very much a different problem and the state of the art in this is well behind that of SNVs and small indels.

ADD COMMENTlink written 14 months ago by d-cameron2.0k
1

Dear Daniel, thank you for your suggestions, and congratulations for the publication of GRIDSS package.

Regarding the package "StructuralVariantAnnotation", I have started learning how to use it, and if I may ask please 2 questions, I would appreciate having your replies :

-- please would it be possible to share a piece of R code that uses "findBreakpointOverlaps(gr)" in order to infer the overlap of the breakpoints with exons, introns or intergenic regions ? -- after the overlap of the breakpoints with the gene information, is there any function available that infer gene fusions or truncations ? (i used to do it with a perl script, although having everything in R, it would be very helpful).

if you wish, we can also talk via email. My email address is tanasa@gmail.com. Thank you ;)

ADD REPLYlink written 14 months ago by Bogdan720
1

please would it be possible to share a piece of R code that uses "findBreakpointOverlaps(gr)" in order to infer the overlap of the breakpoints with exons, introns or intergenic regions ?

The nice thing about the breakpoint format I'm using (a normal bioconductor GRanges object with just an extra $partner field to identify the other side) is that you don't have to do anything special for this. Looking at breakend overlaps is just like looking at SNV/small indel overlaps.

You can see an example of this in https://github.com/PapenfussLab/gridss/blob/master/example/somatic-fusion-gene-candidates.R which is a very simple script that checks for candidate gene fusion. In line 52, you can see that to calculating the gene overlapping a breakend just uses the standard GenomicRanges findOverlaps function. findBreakpointOverlaps is useful when you want find overlaps between breakpoints (e.g. comparing a SV caller to a truth set, or creating an ensemble SV call set), but if you're just comparing breakends to genomic intervals, StructuralVariantAnnotation is only needed for the conversion of VCF to the breakend GRanges format.

ADD REPLYlink written 14 months ago by d-cameron2.0k

Dear Daniel, many many thanks ! very much appreciate sharing the code and your suggestions. Please may I ask, the code infers the fusions starting only from TRA/BND, or we can use also DEL, DUP, or INV in order to infer the fusions ? Thanks !

ADD REPLYlink written 14 months ago by Bogdan720

StructuralVariantAnnotation will convert the output of SV callers into the same GRanges format regardless of whether the caller used TRA/BND or DEL/DUP/INV format to make the call. It works with VCFv4.2 or VCFv4.3 compliant output, but I've added some extra code to handle some of the more popular callers that don't follow the VCF specs correctly.

I've used it successfully on breakdancer, cortex, crest, delly, gridss, hydra, lumpy, manta, pindel, and socrates (although you need to use my conversion script to turn the breakdancer & socrates outputs into VCF format).

ADD REPLYlink modified 14 months ago • written 14 months ago by d-cameron2.0k

Dear Daniel, thank you. We have used DELLY and LUMPY, and now i am looking also into GRIDSS, as our friends from Stanford were very very excited about using GRIDSS on calling SV in cancer genomes.

ADD REPLYlink written 14 months ago by Bogdan720

Hi Daniel, thank you, the R code works very well. If I may add a question please:

when using breakpointRanges(vcf) to transform a VCF into a GR, how are strands + and - especially defined/assigned ? I am using a vcf file from DELLY, where the TRA (and other SV) are encoded as 3to5, 5to3, 3to3, or 5tot5. thanks a lot !

ADD REPLYlink written 14 months ago by Bogdan720
  • indicates that the DNA before the breakpoint is involved in the adjacency, - is the other orientation. For a 500bp chr1 the following GRanges chr1 100 + chr1 200 -

Would indicate a deletion as the DNA leading up to chr1:100 is connected to the DNA at chr1:200 and beyond.

ADD REPLYlink written 14 months ago by d-cameron2.0k

Dear Daniel, thank you for your reply.

Please if I may add a note below, as I would like to understand much better how the "+" and "-" are assigned please :

we have been using DELLY on a few cancer genomes that we had, and DELLY gives the following SV:

DEL are always 3to5

DUP are always 5to3

INV can be 5to5 or 3to3

TRA can be 5to5, 3to3, 5to3 or 3to5.

How do these SV above translate into a notation with + and - ? thank you very much again !

ADD REPLYlink modified 14 months ago • written 14 months ago by Bogdan720

The special case handling for DELLY TRA code can be found https://github.com/PapenfussLab/StructuralVariantAnnotation/blob/master/R/extensions-VCF.R#L336 .

I just reviewed the code and the custom DELLY 5to5 and 3to3 attributes are not used for inversions (I just found this issue myself with manta INV3 and INV5 non-standard attributes). The current version of StructuralVariantAnnotation will ignore 5to5 and 3to3 and report both inversion breakpoints.

ADD REPLYlink written 14 months ago by d-cameron2.0k

Dear Daniel, thanks a lot ! we wish you a happy and fruitful week !

If I may add a question, this time if about TRA, specifically about 3to5, and 5to3 TRANSLOCATIONS.

When do we define a TRA as 3to5, and when it is a TRA 5to3 ? I believe that these are RECIPROCAL TRA.

thank you !

ADD REPLYlink written 14 months ago by Bogdan720

Reciprocal translocations should have two separate translocation events at the same position. Depending on the DNA repair mechanism, the 'same' position are usually slightly different positions.

ADD REPLYlink written 14 months ago by d-cameron2.0k

Dear Daniel, thank you. May I add and ask : how is a TRA 3to5 different than a TRA 5to3 ? thank you ;) !

ADD REPLYlink written 14 months ago by Bogdan720

And talking about INV,

yes, thank you, I am very happy that your R package reports annotations for both 5to5 ad 3to3 INV breakpoints ;)

ADD REPLYlink written 14 months ago by Bogdan720

And, if I may add please another question (that is a bit naive as I have not checked the answer yet):

depending on the type of SV and depending on the type of SV direction/orientation (3to5, 5to3, 5to5, 3tot3),

shall we refer to INVERSIONS,

would we need to include in the R script that computes the gene fusions a piece of R code for instance that inverts the INV coordinates, or the function "breakpointRanges" does it internally already ?

I am asking because I would also like to compute the TRUNCATIONS. thank you very much ;)

ADD REPLYlink written 14 months ago by Bogdan720

In my gene fusion example code, gene orientation and the breakpoint direction must be consistent with the fusion. An upstream gene in the fusion must have the strand matching the breakpoint orientation (+ strand transcript would require a + breakpoint orientation), and the downstream gene requires the oppposite.

Truncation logic will be quite similar.

ADD REPLYlink written 14 months ago by d-cameron2.0k

Dear Daniel,

talking about TRUNCATIONS, shall we define the gene TRUNCATIONS, based on your R code that is available at https://github.com/PapenfussLab/gridss/blob/master/example/somatic-fusion-gene-candidates.R, I believe that we only have to change the lines 70-72 of the R code by "negation" of "couldBeFivePrimeEnd" :

gr <- gr[ (gr$couldBeThreePrimeStart & (! partner(gr)$couldBeFivePrimeEnd)) | ( (! gr$couldBeFivePrimeEnd) & partner(gr)$couldBeThreePrimeStart),]

thank you very much !

ADD REPLYlink written 14 months ago by Bogdan720

Hi Daniel,
I have tried processing delly v0.7.9 output vcf file with breakpointRanges() and all of the BNDs were removed as unpaired, so I was wondering if there was a way around this issue, using your package?

ADD REPLYlink modified 6 months ago • written 6 months ago by anamaria30

The DELLY authors updated their output to use BND notation (which is good) but they only write one of the two records required for a valid VCF BND breakpoint (which is bad).

Technically my tool is doing the correct thing but that isn't particularly useful for users. I have a few other projects taking my time at the moment but I should have an updated version that handles the DELLY notational non-compliance by the end of this month.

Edit: as a workaround, you can update the DELLY VCF to include the other side of the BND breakpoint.

ADD REPLYlink modified 4 months ago • written 4 months ago by d-cameron2.0k

Thanks a lot, Daniel ! yes, if you could please let us know when we can start using it, it would be very helpful !

ADD REPLYlink written 4 months ago by Bogdan720
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1531 users visited in the last hour