Question: Help with Mitelman Database
1
gravatar for mbio.kyle
3.0 years ago by
mbio.kyle300
United States
mbio.kyle300 wrote:

I know this is kind of a long shot, but worth a try! Does anyone have experience working with the data dump for the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer:

ftp://ftp1.nci.nih.gov/pub/CGAP/mitelman.tar.gz

I have extracted and rebuild the database with sqlite3, and I am trying to retrieve gene fusion information (all of them), with there associated disease, sample counts, and PubMed references. No matter what I try things never seem to match up with what appears in the web portal.

Where are the two queries I am currently working with. The first one selects each fusion (I only want GeneA/GeneB fusions, and it lists single gene aberrations in the table anyways, also blank entries sometimes??). There are duplicate rows (one for each chromosome of the fusions) which I why I have the GROUP BY at the end.

This query seems to be working.

SELECT
   rec.Gene,
   rec.Total_Cases,
   kod.Benamning
FROM RECCURENT_DATA as rec
INNER JOIN Koder as kod
ON rec.Code = kod.Kod
WHERE rec.Gene != ""
AND rec.Gene LIKE '%/%'
AND kod.KodTyp = 'MORPH'
GROUP BY rec.Gene, rec.Total_Cases, kod.Benamning

Then I have this one to TRY and get the associated PubMed ids via the MolClinGene table.

SELECT
    mol.Gene,
    GROUP_CONCAT(DISTINCT(ref.Pubmed))
FROM MolClinGene as mol
INNER JOIN REFERENCE as ref
ON mol.RefNo = ref.RefNo
WHERE mol.Gene IN (#LIST OF FUSION NAMES FROM ABOVE#)
GROUP BY mol.Gene

Note that I run the first, and keep a list of the fusions I need to lookup (in python) and then put that in the WHERE clause of the second one. This part fails terribly and I seem to get the right pubs sometimes, the wrong ones sometimes, extra ones sometimes and missing ones sometimes. Maybe someone has done something similar!

Thanks!

cancer database mitelman fusion • 1.1k views
ADD COMMENTlink modified 10 weeks ago by Alex0 • written 3.0 years ago by mbio.kyle300
0
gravatar for mastal511
3.0 years ago by
mastal5112.0k
mastal5112.0k wrote:

The second SELECT statement that you posted looks like it's missing a JOIN clause.

ADD COMMENTlink written 3.0 years ago by mastal5112.0k

Whoops, sorry about that. I updated it.

ADD REPLYlink written 3.0 years ago by mbio.kyle300
0
gravatar for Alex
10 weeks ago by
Alex0
Alex0 wrote:

I ran into a similar problem. I had to inner join on KaryAbnorm and Cytogen to finally select the pubmed ids together with codes and abberations.

The results correspond to the different references that are shown on Mitelman when clicking on an abberation.

SELECT
   rec.ABERRATION as aberration,
   rec.GENE as gene,
   kod.Benamning as morphology,
   rec.Code as morphology_id,
   X.Pubmed as id_pubmed

FROM RECURRENT_DATA as rec

INNER JOIN Koder as kod
ON rec.Code = kod.Kod

INNER JOIN (
    SELECT 
        kary.Abnormality as Abnormality,
        cyt.morph as Code,
        ref.PubMed as Pubmed
    FROM KaryAbnorm as kary
    INNER JOIN Cytogen as cyt
    ON cyt.RefNo = kary.RefNo AND cyt.CaseNo = kary.CaseNo
    INNER JOIN Reference as ref
    ON cyt.RefNo = ref.RefNo
) as x 
ON x.Abnormality = rec.ABERRATION and x.Code = rec.Code

WHERE rec.Gene != ""
AND rec.Gene LIKE '%/%'
AND kod.KodTyp = 'MORPH'

GROUP BY rec.ABERRATION, rec.CODE, rec.GENE, X.Pubmed
ADD COMMENTlink written 10 weeks ago by Alex0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 773 users visited in the last hour