Question: How to create dummy VCF with fusion variants
gravatar for Jackie
2.3 years ago by
United States
Jackie70 wrote:

I am developing a tool which would take a gene list as input, and the output would be a VCF including fusion variants involving the genes of interest.

I am going to use Mitelman database which has a comprehensive curation of fusions from published literatures. However, the breakpoints are in cytoband format, i.e., 19p13, rather than a genomic coordinate, such like chr1 10099767.

My questions are: - Are there any other comprehensive databases that you would recommend which have precise breakpoints information in genomic coordinates format? (I tried COSMIC, TCGA, the fusion files are as not comprehensive, seems many well-known fusions are missing from those). - If Mitelman turns out to be the most comprehensive database, how would you suggest I can find corresponding genomic coordinates for the fusions in mitelman in a easy way?


ADD COMMENTlink modified 2.3 years ago by d-cameron2.2k • written 2.3 years ago by Jackie70

Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer

ADD REPLYlink written 2.3 years ago by Pierre Lindenbaum129k

Thank you Pierre for the answer. I have actually already downloaded the raw data from Mitelman website, but the question now is, how to convert the breakpoints in cytoband format into genomics coordinates, as that's what would be needed for generating a VCF. I think I can download some cytoband annotation file for converting between genomic coordinates and cytobands, but I just wonder whether there is an easier way.


ADD REPLYlink written 2.3 years ago by Jackie70

It wasn't an answer, just a hyperlink because I had no idea of what is "...using is Mitelman which has ..."

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Pierre Lindenbaum129k

Sorry about the confusion, and thanks for providing the hyperlink. I have updated the post by inserting the hyperlink for Mitelman.

ADD REPLYlink written 2.3 years ago by Jackie70
gravatar for d-cameron
2.3 years ago by
d-cameron2.2k wrote:

You need a higher resolution database. If you only have cytoband, that's even lower resolution than gene name. That's not your only problem though. Even if you did have it, identical somatic driver gene fusions are not always at the same genomic position. Eg, a fusion of geneA exon1,2 to geneB exon 4,5,6 would occur for a breakpoint anywhere in gene A intron 2 to anywhere in gene B exon 3. It's even more complicated than that as, for some fusions, functional fusion transcript can generated by possible exon combinations (e.g. at least the first two 2 exons of geneA connected to at least the last 2 exons of gene B).

In summary, there are many possible genomic coordinate pairs that will all result in the same fusion transcript.

ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by d-cameron2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1013 users visited in the last hour