Identifying Homologues In Drosophila For Human Genes On Large Scale
Entering edit mode
12.2 years ago
Wayne ★ 1.0k

I have a list of genes that are mutated in human cancer samples and a collaborator who has a revenant genetic screen in drosophila. I need to identify the homologues for each mutated human gene in Drosophila in order to test them in this genetic screen. I know this is a very complex problem, and I understand the complexities of homology. Still I have too long a list to do each gene manually. I have already referenced homologene database at NCBI and have been able to identify a good homologue for some of the genes, but there are still many that remain. Does anyone know of additional databases that categorize this sort of information, hopefully that are trustworthy. If not any advice on how to tackle the problem would be greatly appreciated. Thank you for your time!

homology database • 12k views
Entering edit mode
12.2 years ago

Adapted from

  1. Go to:
  2. Choose “Ensembl 64″
  3. Choose “Homo sapiens genes (GRCh37.p5)”
  4. Click on “Filters” in the left menu
  5. Unfold the “MULTI SPECIES COMPARISONS” box, tick the “Homolog filters” option and choose “Orthologous Drosophila Genes” from the drop-down menu.
  6. Click on “Attributes” in the left menu
  7. Click on “Homologs”
  8. Unfold the “ORTHOLOGS” box and tick the data you want to get from under the Drosophila Orthologs header (most probably the gene ID and maybe the homology type as well).
  9. Click on the “Results” button (top left)
  10. Choose your favorite output

This will give you a list of ortholgoues for all human genes. To restrict to a subset use the GENE -> ID list limit filter and paste in a list of IDs for your subset of interest.

Entering edit mode

I trust Compara at pretty much the highest level that I do for in silico results I haven't produced myself. Read more about how Compara gene trees are constructed here to form your own judgement:

Entering edit mode

This is an incredible resource . How much do you trust it though? A lot of money is riding on this screen and I want to express the appropriate amount of caution to my collaborators regarding its accuracy.

Entering edit mode

This is a good approach and I've voted it up. Two issues I have with pre-calculated results are 1) When was this done and how many new genes/mRNA isoforms were not part of the comparisons? 2) What comparison parameters were used to make the assignments? Those are often not well described.

Entering edit mode
12.2 years ago

I did this, also for a disease project (not cancer), by BLASTP similarity searches with a E-value cutoffs of 1e-40, 1e-50 and 1-100. Be aware that there will not always be a 1:1 correspondence of Drosophila to human.

Entering edit mode

Thats a good idea.. did download the database to do this or use a batch query, I've never blasted with a list before.

Entering edit mode

Thats a good idea. How did you go about doing this? I have a list of mRNA accession numbers, I do have blast installed locally, but not too familiar with large queries like this. Is there a batch query option on the website? and how did you verify the results when the blast finished ? Thanks a ton for your help!

Entering edit mode

You can either do a batch query to convert the mRNA IDs to sequence queries, or you can use the ID as query itself - if your local installation of BLAST will allow this option in a manner as on the NCBI BLAST page. A batch query is possible at NCBI via the web, but look at the help pages for details. In brief, verification was done by using the human hit as query back against a Drosophila database of RefSeq proteins.

Entering edit mode
12.2 years ago

OK, I don't know much things about Drosophilia but its seems that the UCSC has already computed the BLASt alignment in dm3.hgBlastTab. I've tried to link the table flyBaseGene to the human table kgXref that describes the human genes, via dm3.flyBaseGene:

mysql  --user=genome -A -D dm3
mysql> select * from
dm3.flyBaseGene as F ,
dm3.hgBlastTab as B,
hg19.knownGene as G,
hg19.kgXref as X
where and  and
limit 1\G

*************************** 1. row ***************************
        bin: 637
       name: CG10354-RA
      chrom: chr2L
     strand: +
    txStart: 6908086
      txEnd: 6911432
   cdsStart: 6908165
     cdsEnd: 6910892
  exonCount: 1
 exonStarts: 6908086,
   exonEnds: 6911432,
      query: CG10354-RA
     target: uc002wsf.1
   identity: 56.59
  aliLength: 827
   mismatch: 316
    gapOpen: 11
     qStart: 0
       qEnd: 789
     tStart: 0
       tEnd: 822
     eValue: 0
   bitScore: 889
       name: uc002wsf.1
      chrom: chr20
     strand: +
    txStart: 21283941
      txEnd: 21370463
   cdsStart: 21284036
     cdsEnd: 21369976
  exonCount: 30
 exonStarts: 21283941,21306916,21307127,21309196,21311118,21311253,21312198,21312405,21312920,21314181,21314341,21314574,21314715,21319681,21321358,21324727,21327052,21328783,21328978,21330026,21335426,21336717,21337223,21338373,21346058,21346210,21349100,21362631,21367505,21369910,
   exonEnds: 21284111,21307044,21307239,21309308,21311177,21311343,21312271,21312456,21313078,21314256,21314475,21314632,21314823,21319726,21321490,21324846,21327188,21328891,21329068,21330099,21335510,21336815,21337303,21338430,21346127,21346342,21349228,21362695,21367644,21370463,
  proteinID: Q9H0D6
    alignID: uc002wsf.1
       kgID: uc002wsf.1
       mRNA: NM_012255
       spID: Q9H0D6
spDisplayID: XRN2_HUMAN
 geneSymbol: XRN2
     refseq: NM_012255
    protAcc: NP_036387
description: 5'-3' exoribonuclease 2
1 row in set (0.22 sec)
Entering edit mode

I apologize I am not too familiar with SQL or what you are doing here. Could you provide a little more description and perhaps refer me to some reading? The dm3.hgBlastTab is a precomputed alignment with other genomes? So the idea here is to get the relevant rows that from the same file in flies and link the two based on the alignment ? Thanks for your time really appreciate it.


Login before adding your answer.

Traffic: 1481 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6