I know this is kind of a long shot, but worth a try! Does anyone have experience working with the data dump for the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer:
I have extracted and rebuild the database with sqlite3, and I am trying to retrieve gene fusion information (all of them), with there associated disease, sample counts, and PubMed references. No matter what I try things never seem to match up with what appears in the web portal.
Where are the two queries I am currently working with. The first one selects each fusion (I only want GeneA/GeneB fusions, and it lists single gene aberrations in the table anyways, also blank entries sometimes??). There are duplicate rows (one for each chromosome of the fusions) which I why I have the GROUP BY at the end.
This query seems to be working.
SELECT rec.Gene, rec.Total_Cases, kod.Benamning FROM RECCURENT_DATA as rec INNER JOIN Koder as kod ON rec.Code = kod.Kod WHERE rec.Gene != "" AND rec.Gene LIKE '%/%' AND kod.KodTyp = 'MORPH' GROUP BY rec.Gene, rec.Total_Cases, kod.Benamning
Then I have this one to TRY and get the associated PubMed ids via the MolClinGene table.
SELECT mol.Gene, GROUP_CONCAT(DISTINCT(ref.Pubmed)) FROM MolClinGene as mol INNER JOIN REFERENCE as ref ON mol.RefNo = ref.RefNo WHERE mol.Gene IN (#LIST OF FUSION NAMES FROM ABOVE#) GROUP BY mol.Gene
Note that I run the first, and keep a list of the fusions I need to lookup (in python) and then put that in the WHERE clause of the second one. This part fails terribly and I seem to get the right pubs sometimes, the wrong ones sometimes, extra ones sometimes and missing ones sometimes. Maybe someone has done something similar!