Question

Help with Mitelman Database

1

Entering edit mode

8.1 years ago

mbio.kyle ▴ 380

I know this is kind of a long shot, but worth a try! Does anyone have experience working with the data dump for the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer:

ftp://ftp1.nci.nih.gov/pub/CGAP/mitelman.tar.gz

I have extracted and rebuild the database with sqlite3, and I am trying to retrieve gene fusion information (all of them), with there associated disease, sample counts, and PubMed references. No matter what I try things never seem to match up with what appears in the web portal.

Where are the two queries I am currently working with. The first one selects each fusion (I only want GeneA/GeneB fusions, and it lists single gene aberrations in the table anyways, also blank entries sometimes??). There are duplicate rows (one for each chromosome of the fusions) which I why I have the GROUP BY at the end.

This query seems to be working.

SELECT
   rec.Gene,
   rec.Total_Cases,
   kod.Benamning
FROM RECCURENT_DATA as rec
INNER JOIN Koder as kod
ON rec.Code = kod.Kod
WHERE rec.Gene != ""
AND rec.Gene LIKE '%/%'
AND kod.KodTyp = 'MORPH'
GROUP BY rec.Gene, rec.Total_Cases, kod.Benamning

Then I have this one to TRY and get the associated PubMed ids via the MolClinGene table.

SELECT
    mol.Gene,
    GROUP_CONCAT(DISTINCT(ref.Pubmed))
FROM MolClinGene as mol
INNER JOIN REFERENCE as ref
ON mol.RefNo = ref.RefNo
WHERE mol.Gene IN (#LIST OF FUSION NAMES FROM ABOVE#)
GROUP BY mol.Gene

Note that I run the first, and keep a list of the fusions I need to lookup (in python) and then put that in the WHERE clause of the second one. This part fails terribly and I seem to get the right pubs sometimes, the wrong ones sometimes, extra ones sometimes and missing ones sometimes. Maybe someone has done something similar!

Thanks!

fusion database mitelman cancer • 2.6k views

ADD COMMENT • link updated 5.3 years ago by Alex • 0 • written 8.1 years ago by mbio.kyle ▴ 380

score 0 · Answer 1 · 2016-03-10

0

Entering edit mode

8.1 years ago

mastal511 ★ 2.1k

The second SELECT statement that you posted looks like it's missing a JOIN clause.

ADD COMMENT • link 8.1 years ago by mastal511 ★ 2.1k

0

Entering edit mode

Whoops, sorry about that. I updated it.

ADD REPLY • link 8.1 years ago by mbio.kyle ▴ 380

score 0 · Answer 2 · 2019-01-07

I ran into a similar problem. I had to inner join on KaryAbnorm and Cytogen to finally select the pubmed ids together with codes and abberations.

The results correspond to the different references that are shown on Mitelman when clicking on an abberation.

SELECT
   rec.ABERRATION as aberration,
   rec.GENE as gene,
   kod.Benamning as morphology,
   rec.Code as morphology_id,
   X.Pubmed as id_pubmed

FROM RECURRENT_DATA as rec

INNER JOIN Koder as kod
ON rec.Code = kod.Kod

INNER JOIN (
    SELECT 
        kary.Abnormality as Abnormality,
        cyt.morph as Code,
        ref.PubMed as Pubmed
    FROM KaryAbnorm as kary
    INNER JOIN Cytogen as cyt
    ON cyt.RefNo = kary.RefNo AND cyt.CaseNo = kary.CaseNo
    INNER JOIN Reference as ref
    ON cyt.RefNo = ref.RefNo
) as x 
ON x.Abnormality = rec.ABERRATION and x.Code = rec.Code

WHERE rec.Gene != ""
AND rec.Gene LIKE '%/%'
AND kod.KodTyp = 'MORPH'

GROUP BY rec.ABERRATION, rec.CODE, rec.GENE, X.Pubmed