I have a list of genes that are mutated in human cancer samples and a collaborator who has a revenant genetic screen in drosophila. I need to identify the homologues for each mutated human gene in Drosophila in order to test them in this genetic screen. I know this is a very complex problem, and I understand the complexities of homology. Still I have too long a list to do each gene manually. I have already referenced homologene database at NCBI and have been able to identify a good homologue for some of the genes, but there are still many that remain. Does anyone know of additional databases that categorize this sort of information, hopefully that are trustworthy. If not any advice on how to tackle the problem would be greatly appreciated. Thank you for your time!
Adapted from http://www.ensembl.info/blog/2009/01/21/how-to-get-all-the-orthologous-genes-between-two-species/
- Go to: http://www.ensembl.org/biomart/martview
- Choose “Ensembl 64″
- Choose “Homo sapiens genes (GRCh37.p5)”
- Click on “Filters” in the left menu
- Unfold the “MULTI SPECIES COMPARISONS” box, tick the “Homolog filters” option and choose “Orthologous Drosophila Genes” from the drop-down menu.
- Click on “Attributes” in the left menu
- Click on “Homologs”
- Unfold the “ORTHOLOGS” box and tick the data you want to get from under the Drosophila Orthologs header (most probably the gene ID and maybe the homology type as well).
- Click on the “Results” button (top left)
- Choose your favorite output
This will give you a list of ortholgoues for all human genes. To restrict to a subset use the GENE -> ID list limit filter and paste in a list of IDs for your subset of interest.
I did this, also for a disease project (not cancer), by BLASTP similarity searches with a E-value cutoffs of 1e-40, 1e-50 and 1-100. Be aware that there will not always be a 1:1 correspondence of Drosophila to human.
OK, I don't know much things about Drosophilia but its seems that the UCSC has already computed the BLASt alignment in dm3.hgBlastTab. I've tried to link the table flyBaseGene to the human table kgXref that describes the human genes, via dm3.flyBaseGene:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D dm3 mysql> select * from dm3.flyBaseGene as F , dm3.hgBlastTab as B, hg19.knownGene as G, hg19.kgXref as X where G.name=B.target and X.kgId=G.name and F.name=B.query limit 1\G *************************** 1. row *************************** bin: 637 name: CG10354-RA chrom: chr2L strand: + txStart: 6908086 txEnd: 6911432 cdsStart: 6908165 cdsEnd: 6910892 exonCount: 1 exonStarts: 6908086, exonEnds: 6911432, query: CG10354-RA target: uc002wsf.1 identity: 56.59 aliLength: 827 mismatch: 316 gapOpen: 11 qStart: 0 qEnd: 789 tStart: 0 tEnd: 822 eValue: 0 bitScore: 889 name: uc002wsf.1 chrom: chr20 strand: + txStart: 21283941 txEnd: 21370463 cdsStart: 21284036 cdsEnd: 21369976 exonCount: 30 exonStarts: 21283941,21306916,21307127,21309196,21311118,21311253,21312198,21312405,21312920,21314181,21314341,21314574,21314715,21319681,21321358,21324727,21327052,21328783,21328978,21330026,21335426,21336717,21337223,21338373,21346058,21346210,21349100,21362631,21367505,21369910, exonEnds: 21284111,21307044,21307239,21309308,21311177,21311343,21312271,21312456,21313078,21314256,21314475,21314632,21314823,21319726,21321490,21324846,21327188,21328891,21329068,21330099,21335510,21336815,21337303,21338430,21346127,21346342,21349228,21362695,21367644,21370463, proteinID: Q9H0D6 alignID: uc002wsf.1 kgID: uc002wsf.1 mRNA: NM_012255 spID: Q9H0D6 spDisplayID: XRN2_HUMAN geneSymbol: XRN2 refseq: NM_012255 protAcc: NP_036387 description: 5'-3' exoribonuclease 2 1 row in set (0.22 sec)