Question

Identifying Homologues In Drosophila For Human Genes On Large Scale

4

Entering edit mode

12.9 years ago

Wayne ★ 1.0k

I have a list of genes that are mutated in human cancer samples and a collaborator who has a revenant genetic screen in drosophila. I need to identify the homologues for each mutated human gene in Drosophila in order to test them in this genetic screen. I know this is a very complex problem, and I understand the complexities of homology. Still I have too long a list to do each gene manually. I have already referenced homologene database at NCBI and have been able to identify a good homologue for some of the genes, but there are still many that remain. Does anyone know of additional databases that categorize this sort of information, hopefully that are trustworthy. If not any advice on how to tackle the problem would be greatly appreciated. Thank you for your time!

homology database • 12k views

ADD COMMENT • link updated 12.6 years ago by Casey Bergman 18k • written 12.9 years ago by Wayne ★ 1.0k

Ram · Answer 1 · 2011-12-06

7

Entering edit mode

12.9 years ago

Casey Bergman 18k

Adapted from http://www.ensembl.info/blog/2009/01/21/how-to-get-all-the-orthologous-genes-between-two-species/

Go to: http://www.ensembl.org/biomart/martview
Choose “Ensembl 64″
Choose “Homo sapiens genes (GRCh37.p5)”
Click on “Filters” in the left menu
Unfold the “MULTI SPECIES COMPARISONS” box, tick the “Homolog filters” option and choose “Orthologous Drosophila Genes” from the drop-down menu.
Click on “Attributes” in the left menu
Click on “Homologs”
Unfold the “ORTHOLOGS” box and tick the data you want to get from under the Drosophila Orthologs header (most probably the gene ID and maybe the homology type as well).
Click on the “Results” button (top left)
Choose your favorite output

This will give you a list of ortholgoues for all human genes. To restrict to a subset use the GENE -> ID list limit filter and paste in a list of IDs for your subset of interest.

ADD COMMENT • link updated 5.1 years ago by Ram 44k • written 12.9 years ago by Casey Bergman 18k

1

Entering edit mode

I trust Compara at pretty much the highest level that I do for in silico results I haven't produced myself. Read more about how Compara gene trees are constructed here to form your own judgement: http://www.ensembl.org/info/docs/compara/homology_method.html

ADD REPLY • link 12.9 years ago by Casey Bergman 18k

0

Entering edit mode

This is an incredible resource . How much do you trust it though? A lot of money is riding on this screen and I want to express the appropriate amount of caution to my collaborators regarding its accuracy.

ADD REPLY • link 12.9 years ago by Wayne ★ 1.0k

0

Entering edit mode

This is a good approach and I've voted it up. Two issues I have with pre-calculated results are 1) When was this done and how many new genes/mRNA isoforms were not part of the comparisons? 2) What comparison parameters were used to make the assignments? Those are often not well described.

ADD REPLY • link 12.9 years ago by Larry_Parnell 16k

score 2 · Answer 2 · 2011-12-06

2

Entering edit mode

12.9 years ago

Larry_Parnell 16k

I did this, also for a disease project (not cancer), by BLASTP similarity searches with a E-value cutoffs of 1e-40, 1e-50 and 1-100. Be aware that there will not always be a 1:1 correspondence of Drosophila to human.

ADD COMMENT • link 12.9 years ago by Larry_Parnell 16k

0

Entering edit mode

Thats a good idea.. did download the database to do this or use a batch query, I've never blasted with a list before.

ADD REPLY • link 12.9 years ago by Wayne ★ 1.0k

0

Entering edit mode

Thats a good idea. How did you go about doing this? I have a list of mRNA accession numbers, I do have blast installed locally, but not too familiar with large queries like this. Is there a batch query option on the website? and how did you verify the results when the blast finished ? Thanks a ton for your help!

ADD REPLY • link 12.9 years ago by Wayne ★ 1.0k

0

Entering edit mode

You can either do a batch query to convert the mRNA IDs to sequence queries, or you can use the ID as query itself - if your local installation of BLAST will allow this option in a manner as on the NCBI BLAST page. A batch query is possible at NCBI via the web, but look at the help pages for details. In brief, verification was done by using the human hit as query back against a Drosophila database of RefSeq proteins.

ADD REPLY • link 12.9 years ago by Larry_Parnell 16k

score 2 · Answer 3 · 2011-12-06

OK, I don't know much things about Drosophilia but its seems that the UCSC has already computed the BLASt alignment in dm3.hgBlastTab. I've tried to link the table flyBaseGene to the human table kgXref that describes the human genes, via dm3.flyBaseGene:

mysql  --user=genome --host=genome-mysql.cse.ucsc.edu -A -D dm3
mysql> select * from
dm3.flyBaseGene as F ,
dm3.hgBlastTab as B,
hg19.knownGene as G,
hg19.kgXref as X
where
G.name=B.target and
X.kgId=G.name  and
F.name=B.query
limit 1\G

*************************** 1. row ***************************
        bin: 637
       name: CG10354-RA
      chrom: chr2L
     strand: +
    txStart: 6908086
      txEnd: 6911432
   cdsStart: 6908165
     cdsEnd: 6910892
  exonCount: 1
 exonStarts: 6908086,
   exonEnds: 6911432,
      query: CG10354-RA
     target: uc002wsf.1
   identity: 56.59
  aliLength: 827
   mismatch: 316
    gapOpen: 11
     qStart: 0
       qEnd: 789
     tStart: 0
       tEnd: 822
     eValue: 0
   bitScore: 889
       name: uc002wsf.1
      chrom: chr20
     strand: +
    txStart: 21283941
      txEnd: 21370463
   cdsStart: 21284036
     cdsEnd: 21369976
  exonCount: 30
 exonStarts: 21283941,21306916,21307127,21309196,21311118,21311253,21312198,21312405,21312920,21314181,21314341,21314574,21314715,21319681,21321358,21324727,21327052,21328783,21328978,21330026,21335426,21336717,21337223,21338373,21346058,21346210,21349100,21362631,21367505,21369910,
   exonEnds: 21284111,21307044,21307239,21309308,21311177,21311343,21312271,21312456,21313078,21314256,21314475,21314632,21314823,21319726,21321490,21324846,21327188,21328891,21329068,21330099,21335510,21336815,21337303,21338430,21346127,21346342,21349228,21362695,21367644,21370463,
  proteinID: Q9H0D6
    alignID: uc002wsf.1
       kgID: uc002wsf.1
       mRNA: NM_012255
       spID: Q9H0D6
spDisplayID: XRN2_HUMAN
 geneSymbol: XRN2
     refseq: NM_012255
    protAcc: NP_036387
description: 5'-3' exoribonuclease 2
1 row in set (0.22 sec)