Question: annotation of SV (structural variants)
7
gravatar for Bogdan
2.0 years ago by
Bogdan910
Palo Alto, CA, USA
Bogdan910 wrote:

Dear all,

which tools would you recommend to use for the annotation of SV (structural variants) with gene information (SV including deletions, insertions, translocations, inversions, duplications). thank you !

-- bogdan

annotation sv • 3.3k views
ADD COMMENTlink modified 22 days ago by rishabh3196sharma0 • written 2.0 years ago by Bogdan910

You have these stored in VCF format?

ADD REPLYlink written 2.0 years ago by Kevin Blighe53k

yes, Kevin, the files are in VCF format.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Bogdan910

Can you try something like ANNOVAR? - start from the Quick Start-Up Guide. Installing everything can take a while.

Then, to annotate a VCF, you would do something like this:

perl annovar/convert2annovar.pl -format vcf4 -withzyg --allsample -outfile anotation.ann MySV.vcf ;

perl annovar/table_annovar.pl anotation.ann.avinput annovar/humandb/ -buildve hg19 -remove -otherinfo -onetranscript -protocol refGene,cytoBand,esp6500siv2_all,exac03,dbnsfp30a,avsnp147,cosmic70 -operation g,r,f,f,f,f,f -nastring "NA" -csvout ;

ANNOVAR Also allows you to annotate via the Database of Genomic Variants (DGV).

ADD REPLYlink written 2.0 years ago by Kevin Blighe53k

SV annotation (with OMIM, DGV, 1000g, haploinsufficiency, TAD, ... and also with your own in-house information) can be easily automated !

You can look at this post describing the annotSV tool: Annotation for SV and CNV

ADD REPLYlink modified 18 months ago • written 18 months ago by LGMgeo90
2
gravatar for JJ
2.0 years ago by
JJ470
JJ470 wrote:

I recommend snpeff. You can annotate SNPs and SVs together in one go :) And you get a nice report!

ADD COMMENTlink written 2.0 years ago by JJ470

Hi, how can you use Snpeff to annotate translocations for example?

ADD REPLYlink written 23 months ago by alons270
1

I have tried SnpEff with an output vcf file from DELLY 0.7.5, where the TRANSLOCATIONS are annotated in <tra> format, and not in <bnd> format, and the latest version of SnpEff did not wok on <tra> items.

ADD REPLYlink written 23 months ago by Bogdan910

I see, I'll try too, please update if you succeed. I will as well.

ADD REPLYlink written 23 months ago by alons270

You should update to Delly to 0.7.7 which uses the BND format - then it will work

ADD REPLYlink modified 23 months ago • written 23 months ago by JJ470
0
gravatar for d-cameron
2.0 years ago by
d-cameron2.1k
Australia
d-cameron2.1k wrote:

This very much depends on what sort of annotation you want to do.

If you're interested in breakend overlap annotation, then I recommend using my StructuralVariantAnnotation package to convert to a paired breakend GRanges object then using the standard bioconductor packages to annotate stuff like repeat overlaps, gene position and orientation (ie is this a viable fusion gene), and so on.

If you're wanting to do annotations of the functional impact of the variant, then this is very much a different problem and the state of the art in this is well behind that of SNVs and small indels.

ADD COMMENTlink written 2.0 years ago by d-cameron2.1k
1

Dear Daniel, thank you for your suggestions, and congratulations for the publication of GRIDSS package.

Regarding the package "StructuralVariantAnnotation", I have started learning how to use it, and if I may ask please 2 questions, I would appreciate having your replies :

-- please would it be possible to share a piece of R code that uses "findBreakpointOverlaps(gr)" in order to infer the overlap of the breakpoints with exons, introns or intergenic regions ? -- after the overlap of the breakpoints with the gene information, is there any function available that infer gene fusions or truncations ? (i used to do it with a perl script, although having everything in R, it would be very helpful).

if you wish, we can also talk via email. My email address is tanasa@gmail.com. Thank you ;)

ADD REPLYlink written 24 months ago by Bogdan910
1

please would it be possible to share a piece of R code that uses "findBreakpointOverlaps(gr)" in order to infer the overlap of the breakpoints with exons, introns or intergenic regions ?

The nice thing about the breakpoint format I'm using (a normal bioconductor GRanges object with just an extra $partner field to identify the other side) is that you don't have to do anything special for this. Looking at breakend overlaps is just like looking at SNV/small indel overlaps.

You can see an example of this in https://github.com/PapenfussLab/gridss/blob/master/example/somatic-fusion-gene-candidates.R which is a very simple script that checks for candidate gene fusion. In line 52, you can see that to calculating the gene overlapping a breakend just uses the standard GenomicRanges findOverlaps function. findBreakpointOverlaps is useful when you want find overlaps between breakpoints (e.g. comparing a SV caller to a truth set, or creating an ensemble SV call set), but if you're just comparing breakends to genomic intervals, StructuralVariantAnnotation is only needed for the conversion of VCF to the breakend GRanges format.

ADD REPLYlink written 24 months ago by d-cameron2.1k

Dear Daniel, many many thanks ! very much appreciate sharing the code and your suggestions. Please may I ask, the code infers the fusions starting only from TRA/BND, or we can use also DEL, DUP, or INV in order to infer the fusions ? Thanks !

ADD REPLYlink written 24 months ago by Bogdan910

StructuralVariantAnnotation will convert the output of SV callers into the same GRanges format regardless of whether the caller used TRA/BND or DEL/DUP/INV format to make the call. It works with VCFv4.2 or VCFv4.3 compliant output, but I've added some extra code to handle some of the more popular callers that don't follow the VCF specs correctly.

I've used it successfully on breakdancer, cortex, crest, delly, gridss, hydra, lumpy, manta, pindel, and socrates (although you need to use my conversion script to turn the breakdancer & socrates outputs into VCF format).

ADD REPLYlink modified 23 months ago • written 23 months ago by d-cameron2.1k

Dear Daniel, thank you. We have used DELLY and LUMPY, and now i am looking also into GRIDSS, as our friends from Stanford were very very excited about using GRIDSS on calling SV in cancer genomes.

ADD REPLYlink written 23 months ago by Bogdan910

Hi Daniel, thank you, the R code works very well. If I may add a question please:

when using breakpointRanges(vcf) to transform a VCF into a GR, how are strands + and - especially defined/assigned ? I am using a vcf file from DELLY, where the TRA (and other SV) are encoded as 3to5, 5to3, 3to3, or 5tot5. thanks a lot !

ADD REPLYlink written 23 months ago by Bogdan910
  • indicates that the DNA before the breakpoint is involved in the adjacency, - is the other orientation. For a 500bp chr1 the following GRanges chr1 100 + chr1 200 -

Would indicate a deletion as the DNA leading up to chr1:100 is connected to the DNA at chr1:200 and beyond.

ADD REPLYlink written 23 months ago by d-cameron2.1k

Dear Daniel, thank you for your reply.

Please if I may add a note below, as I would like to understand much better how the "+" and "-" are assigned please :

we have been using DELLY on a few cancer genomes that we had, and DELLY gives the following SV:

DEL are always 3to5

DUP are always 5to3

INV can be 5to5 or 3to3

TRA can be 5to5, 3to3, 5to3 or 3to5.

How do these SV above translate into a notation with + and - ? thank you very much again !

ADD REPLYlink modified 23 months ago • written 23 months ago by Bogdan910

The special case handling for DELLY TRA code can be found https://github.com/PapenfussLab/StructuralVariantAnnotation/blob/master/R/extensions-VCF.R#L336 .

I just reviewed the code and the custom DELLY 5to5 and 3to3 attributes are not used for inversions (I just found this issue myself with manta INV3 and INV5 non-standard attributes). The current version of StructuralVariantAnnotation will ignore 5to5 and 3to3 and report both inversion breakpoints.

ADD REPLYlink written 23 months ago by d-cameron2.1k

Dear Daniel, thanks a lot ! we wish you a happy and fruitful week !

If I may add a question, this time if about TRA, specifically about 3to5, and 5to3 TRANSLOCATIONS.

When do we define a TRA as 3to5, and when it is a TRA 5to3 ? I believe that these are RECIPROCAL TRA.

thank you !

ADD REPLYlink written 23 months ago by Bogdan910

Reciprocal translocations should have two separate translocation events at the same position. Depending on the DNA repair mechanism, the 'same' position are usually slightly different positions.

ADD REPLYlink written 23 months ago by d-cameron2.1k

Dear Daniel, thank you. May I add and ask : how is a TRA 3to5 different than a TRA 5to3 ? thank you ;) !

ADD REPLYlink written 23 months ago by Bogdan910

And talking about INV,

yes, thank you, I am very happy that your R package reports annotations for both 5to5 ad 3to3 INV breakpoints ;)

ADD REPLYlink written 23 months ago by Bogdan910

And, if I may add please another question (that is a bit naive as I have not checked the answer yet):

depending on the type of SV and depending on the type of SV direction/orientation (3to5, 5to3, 5to5, 3tot3),

shall we refer to INVERSIONS,

would we need to include in the R script that computes the gene fusions a piece of R code for instance that inverts the INV coordinates, or the function "breakpointRanges" does it internally already ?

I am asking because I would also like to compute the TRUNCATIONS. thank you very much ;)

ADD REPLYlink written 23 months ago by Bogdan910

In my gene fusion example code, gene orientation and the breakpoint direction must be consistent with the fusion. An upstream gene in the fusion must have the strand matching the breakpoint orientation (+ strand transcript would require a + breakpoint orientation), and the downstream gene requires the oppposite.

Truncation logic will be quite similar.

ADD REPLYlink written 23 months ago by d-cameron2.1k

Dear Daniel,

talking about TRUNCATIONS, shall we define the gene TRUNCATIONS, based on your R code that is available at https://github.com/PapenfussLab/gridss/blob/master/example/somatic-fusion-gene-candidates.R, I believe that we only have to change the lines 70-72 of the R code by "negation" of "couldBeFivePrimeEnd" :

gr <- gr[ (gr$couldBeThreePrimeStart & (! partner(gr)$couldBeFivePrimeEnd)) | ( (! gr$couldBeFivePrimeEnd) & partner(gr)$couldBeThreePrimeStart),]

thank you very much !

ADD REPLYlink written 23 months ago by Bogdan910

Hi Daniel,
I have tried processing delly v0.7.9 output vcf file with breakpointRanges() and all of the BNDs were removed as unpaired, so I was wondering if there was a way around this issue, using your package?

ADD REPLYlink modified 15 months ago • written 15 months ago by anamaria30

The DELLY authors updated their output to use BND notation (which is good) but they only write one of the two records required for a valid VCF BND breakpoint (which is bad).

Technically my tool is doing the correct thing but that isn't particularly useful for users. I have a few other projects taking my time at the moment but I should have an updated version that handles the DELLY notational non-compliance by the end of this month.

Edit: as a workaround, you can update the DELLY VCF to include the other side of the BND breakpoint.

ADD REPLYlink modified 14 months ago • written 14 months ago by d-cameron2.1k

Thanks a lot, Daniel ! yes, if you could please let us know when we can start using it, it would be very helpful !

ADD REPLYlink written 14 months ago by Bogdan910
0
gravatar for rishabh3196sharma
22 days ago by
rishabh3196sharma0 wrote:

@Kevin Blighe dear Kevin Blighe,

i had the gsa microarray data in which i wanted to identify the cnv's .... the cnv were known through illumina genome studio software which gave txt file like this.

SampleID BookmarkType Chr Start End Size Author CreatedDate Value Comment 203389250004_R01C01 [1] CNV Bin: Min 2.5 To Max 3.5 3 197788369 197871052 82683 12/28/2019 7:31:44 PM 3 CNV Confidence: 58.26665 203389250004_R01C01 [1] CNV Bin: Min 0.5 To Max 1.5 9 105596881 105835972 239091 12/28/2019 7:31:44 PM 1 CNV Confidence: 95.1899

for annotation of this txt file with annovar tool i converted it into vcf format which looked something like this

Chr Start End Size Value Comment BookmarkType INFO

3 197788369 197871052 82683 3 CNV Confidence: 58.26665 CNV Bin: Min 2.5 To Max 3.5 203389250004_R01C01 [1],,12/28/2019 7:31:44 PM 9 105596881 105835972 239091 1 CNV Confidence: 95.1899 CNV Bin: Min 0.5 To Max 1.5 203389250004_R01C01 [1],,12/28/2019 7:31:44 PM

i want to know is this file good for inputting it into annovar and also how to annotate this vcf file using annovar, once i have downloaded all the database defined by you in this post earlier i have followed the commands given by you

perl annovar/convert2annovar.pl -format vcf4 -withzyg --allsample -outfile anotation.ann input.vcf

perl annovar/table_annovar.pl anotation.ann annovar/humandb/ -buildve hg19 -remove -otherinfo -onetranscript -protocol refGene,cytoBand,esp6500siv2_all,exac03,dbnsfp35c,avsnp150,cosmic70 -operation g,r,f,f,f,f,f -nastring "NA" -csvout it generates file something like this

Chr,Start,End,Ref,Alt,Func.refGene,Gene.refGene,GeneDetail.refGene,ExonicFunc.refGene,AAChange.refGene,cytoBand,esp6500siv2_all,ExAC_ALL,ExAC_AFR,ExAC_AMR,ExAC_EAS,ExAC_FIN,ExAC_NFE,ExAC_OTH,ExAC_SAS,SIFT_score,SIFT_converted_rankscore,SIFT_pred,LRT_score,LRT_converted_rankscore,LRT_pred,MutationTaster_score,MutationTaster_converted_rankscore,MutationTaster_pred,MutationAssessor_score,MutationAssessor_score_rankscore,MutationAssessor_pred,FATHMM_score,FATHMM_converted_rankscore,FATHMM_pred,PROVEAN_score,PROVEAN_converted_rankscore,PROVEAN_pred,MetaSVM_score,MetaSVM_rankscore,MetaSVM_pred,MetaLR_score,MetaLR_rankscore,MetaLR_pred,M-CAP_score,M-CAP_rankscore,M-CAP_pred,MutPred_score,MutPred_rankscore,fathmm-MKL_coding_score,fathmm-MKL_coding_rankscore,fathmm-MKL_coding_pred,Eigen_coding_or_noncoding,Eigen-raw,Eigen-PC-raw,GenoCanyon_score,GenoCanyon_score_rankscore,integrated_fitCons_score,integrated_fitCons_score_rankscore,integrated_confidence_value,GERP++_RS,GERP++_RS_rankscore,phyloP100way_vertebrate,phyloP100way_vertebrate_rankscore,phyloP20way_mammalian,phyloP20way_mammalian_rankscore,phastCons100way_vertebrate,phastCons100way_vertebrate_rankscore,phastCons20way_mammalian,phastCons20way_mammalian_rankscore,SiPhy_29way_logOdds,SiPhy_29way_logOdds_rankscore,Interpro_domain,GTEx_V6p_gene,GTEx_V6p_tissue,avsnp150,cosmic70,Otherinfo1,Otherinfo2,Otherinfo3 3,197788369,197788369,0,0,"ncRNA_intronic","ANKRD18DP",NA,NA,NA,"3q29",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,". CNV Confidence: 58.26665

i want to know the rsids of the cnv for this particular position only and so on in my file. Can you please help me out with this and if i am wrong somewhere please correct me.

thankyou,

ADD COMMENTlink written 22 days ago by rishabh3196sharma0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2211 users visited in the last hour