Question: Identifying Homologues In Drosophila For Human Genes On Large Scale
4
gravatar for Wayne
7.4 years ago by
Wayne1000
United States
Wayne1000 wrote:

I have a list of genes that are mutated in human cancer samples and a collaborator who has a revenant genetic screen in drosophila. I need to identify the homologues for each mutated human gene in Drosophila in order to test them in this genetic screen. I know this is a very complex problem, and I understand the complexities of homology. Still I have too long a list to do each gene manually. I have already referenced homologene database at NCBI and have been able to identify a good homologue for some of the genes, but there are still many that remain. Does anyone know of additional databases that categorize this sort of information, hopefully that are trustworthy. If not any advice on how to tackle the problem would be greatly appreciated. Thank you for your time!

database homology • 9.1k views
ADD COMMENTlink modified 7.1 years ago by Casey Bergman18k • written 7.4 years ago by Wayne1000
7
gravatar for Casey Bergman
7.4 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

Adapted from http://www.ensembl.info/blog/2009/01/21/how-to-get-all-the-orthologous-genes-between-two-species/

  1. Go to: http://www.ensembl.org/biomart/martview
  2. Choose “Ensembl 64″
  3. Choose “Homo sapiens genes (GRCh37.p5)”
  4. Click on “Filters” in the left menu
  5. Unfold the “MULTI SPECIES COMPARISONS” box, tick the “Homolog filters” option and choose “Orthologous Drosophila Genes” from the drop-down menu.
  6. Click on “Attributes” in the left menu
  7. Click on “Homologs”
  8. Unfold the “ORTHOLOGS” box and tick the data you want to get from under the Drosophila Orthologs header (most probably the gene ID and maybe the homology type as well).
  9. Click on the “Results” button (top left)
  10. Choose your favorite output

This will give you a list of ortholgoues for all human genes. To restrict to a subset use the GENE -> ID list limit filter and paste in a list of IDs for your subset of interest.

ADD COMMENTlink written 7.4 years ago by Casey Bergman18k
1

I trust Compara at pretty much the highest level that I do for in silico results I haven't produced myself. Read more about how Compara gene trees are constructed here to form your own judgement: http://www.ensembl.org/info/docs/compara/homology_method.html

ADD REPLYlink written 7.4 years ago by Casey Bergman18k

This is an incredible resource . How much do you trust it though? A lot of money is riding on this screen and I want to express the appropriate amount of caution to my collaborators regarding its accuracy.

ADD REPLYlink written 7.4 years ago by Wayne1000

This is a good approach and I've voted it up. Two issues I have with pre-calculated results are 1) When was this done and how many new genes/mRNA isoforms were not part of the comparisons? 2) What comparison parameters were used to make the assignments? Those are often not well described.

ADD REPLYlink written 7.4 years ago by Larry_Parnell16k
2
gravatar for Larry_Parnell
7.4 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

I did this, also for a disease project (not cancer), by BLASTP similarity searches with a E-value cutoffs of 1e-40, 1e-50 and 1-100. Be aware that there will not always be a 1:1 correspondence of Drosophila to human.

ADD COMMENTlink written 7.4 years ago by Larry_Parnell16k

Thats a good idea.. did download the database to do this or use a batch query, I've never blasted with a list before.

ADD REPLYlink written 7.4 years ago by Wayne1000

Thats a good idea. How did you go about doing this? I have a list of mRNA accession numbers, I do have blast installed locally, but not too familiar with large queries like this. Is there a batch query option on the website? and how did you verify the results when the blast finished ? Thanks a ton for your help!

ADD REPLYlink written 7.4 years ago by Wayne1000

You can either do a batch query to convert the mRNA IDs to sequence queries, or you can use the ID as query itself - if your local installation of BLAST will allow this option in a manner as on the NCBI BLAST page. A batch query is possible at NCBI via the web, but look at the help pages for details. In brief, verification was done by using the human hit as query back against a Drosophila database of RefSeq proteins.

ADD REPLYlink written 7.4 years ago by Larry_Parnell16k
2
gravatar for Pierre Lindenbaum
7.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

OK, I don't know much things about Drosophilia but its seems that the UCSC has already computed the BLASt alignment in dm3.hgBlastTab. I've tried to link the table flyBaseGene to the human table kgXref that describes the human genes, via dm3.flyBaseGene:

mysql  --user=genome --host=genome-mysql.cse.ucsc.edu -A -D dm3
mysql> select * from
dm3.flyBaseGene as F ,
dm3.hgBlastTab as B,
hg19.knownGene as G,
hg19.kgXref as X
where
G.name=B.target and
X.kgId=G.name  and
F.name=B.query
limit 1\G

*************************** 1. row ***************************
        bin: 637
       name: CG10354-RA
      chrom: chr2L
     strand: +
    txStart: 6908086
      txEnd: 6911432
   cdsStart: 6908165
     cdsEnd: 6910892
  exonCount: 1
 exonStarts: 6908086,
   exonEnds: 6911432,
      query: CG10354-RA
     target: uc002wsf.1
   identity: 56.59
  aliLength: 827
   mismatch: 316
    gapOpen: 11
     qStart: 0
       qEnd: 789
     tStart: 0
       tEnd: 822
     eValue: 0
   bitScore: 889
       name: uc002wsf.1
      chrom: chr20
     strand: +
    txStart: 21283941
      txEnd: 21370463
   cdsStart: 21284036
     cdsEnd: 21369976
  exonCount: 30
 exonStarts: 21283941,21306916,21307127,21309196,21311118,21311253,21312198,21312405,21312920,21314181,21314341,21314574,21314715,21319681,21321358,21324727,21327052,21328783,21328978,21330026,21335426,21336717,21337223,21338373,21346058,21346210,21349100,21362631,21367505,21369910,
   exonEnds: 21284111,21307044,21307239,21309308,21311177,21311343,21312271,21312456,21313078,21314256,21314475,21314632,21314823,21319726,21321490,21324846,21327188,21328891,21329068,21330099,21335510,21336815,21337303,21338430,21346127,21346342,21349228,21362695,21367644,21370463,
  proteinID: Q9H0D6
    alignID: uc002wsf.1
       kgID: uc002wsf.1
       mRNA: NM_012255
       spID: Q9H0D6
spDisplayID: XRN2_HUMAN
 geneSymbol: XRN2
     refseq: NM_012255
    protAcc: NP_036387
description: 5'-3' exoribonuclease 2
1 row in set (0.22 sec)
ADD COMMENTlink written 7.4 years ago by Pierre Lindenbaum118k

I apologize I am not too familiar with SQL or what you are doing here. Could you provide a little more description and perhaps refer me to some reading? The dm3.hgBlastTab is a precomputed alignment with other genomes? So the idea here is to get the relevant rows that from the same file in flies and link the two based on the alignment ? Thanks for your time really appreciate it.

ADD REPLYlink written 7.4 years ago by Wayne1000
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2045 users visited in the last hour