Question: Retrieve All Genes Associated With A Go Term
17
gravatar for enricoferrero
6.7 years ago by
enricoferrero760
United Kingdom
enricoferrero760 wrote:

Hi,

I'm looking for an easy way to retrieve all the genes in a list that are associated with a certain GO term, preferably using R/Bioconductor packages. I'm not interested in under/overrepresentation or enrichment.

For instance, say I have a list of 1000 genes and I want to create a sublist with only the genes known to be involved in 'heart development'.

Thanks!

R bioconductor go gene-ontology • 16k views
ADD COMMENTlink modified 16 months ago by EagleEye6.2k • written 6.7 years ago by enricoferrero760
11
gravatar for Dave Bridges
6.1 years ago by
Dave Bridges1.3k
Ann Arbor, MI
Dave Bridges1.3k wrote:

Using biomaRt within R:

library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl") #uses human ensembl annotations
#gets gene symbol, transcript_id and go_id for all genes annotated with GO:0007507
gene.data <- getBM(attributes=c('hgnc_symbol', 'ensembl_transcript_id', 'go_id'),
                   filters = 'go_id', values = 'GO:0007507', mart = ensembl)
ADD COMMENTlink written 6.1 years ago by Dave Bridges1.3k

Hi, I've been trying to use this - and it worked ages ago now I keep getting an error saying

"Error in getBM(attributes = c("wikigene_name", "ensembl_transcript_id",  : 
  Invalid filters(s): go_id"

any suggestions? thanks

ADD REPLYlink written 20 months ago by V100

Change "go_id" to "go".

You can find the valid filter names with listFilters(ensembl)

ADD REPLYlink written 20 months ago by agatawesol20

This seems to give the genes only specifically annotated to the given GO term, and not any genes associated with the child terms. Mostly one is interested in ALL the genes for a GO term, i.e, with both direct and indirect annotations.

ADD REPLYlink written 19 days ago by hbw70
7
gravatar for Pierre Lindenbaum
6.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

Not using R: Use quickGO to download all the annotation about your term (and its descendants):

$ curl -s "http://www.ebi.ac.uk/QuickGO/GAnnotation?tax=9606&relType=IP&goid=%20GO:0007507%20&format=tsv" | head | verticalize
>>>    2
$1    DB           UniProtKB
$2    ID           A0PJ49
$3    Splice       -
$4    Symbol       FGFRL1
$5    Taxon        9606
$6    Qualifier    -
$7    GO ID        GO:0003179
$8    GO Name      heart valve morphogenesis
$9    Reference    GO_REF:0000019
$10    Evidence     IEA
$11    With         Ensembl:ENSMUSP00000013633
$12    Aspect       Process
$13    Date         20120825
$14    Source       ENSEMBL
<<<    2

>>>    3
$1    DB           UniProtKB
$2    ID           A0PJ49
$3    Splice       -
$4    Symbol       FGFRL1
$5    Taxon        9606
$6    Qualifier    -
$7    GO ID        GO:0060412
$8    GO Name      ventricular septum morphogenesis
$9    Reference    GO_REF:0000019
$10    Evidence     IEA
$11    With         Ensembl:ENSMUSP00000013633
$12    Aspect       Process
$13    Date         20120825
$14    Source       ENSEMBL
<<<    3

>>>    4
$1    DB           UniProtKB
$2    ID           A0SZU5
$3    Splice       -
$4    Symbol       -
$5    Taxon        9606
$6    Qualifier    -
$7    GO ID        GO:0003007
$8    GO Name      heart morphogenesis
$9    Reference    GO_REF:0000019
$10    Evidence     IEA
$11    With         Ensembl:ENSMUSP00000058354
$12    Aspect       Process
$13    Date         20120825
$14    Source       ENSEMBL
<<<    4

(...)

sort and join with your list of genes.

ADD COMMENTlink written 6.7 years ago by Pierre Lindenbaum119k
2

I love the ease with which you can query data and customize the output with quickGO, but the verticalize in that command really threw me. I thought there was a cool member of the unix tool chain I didn't know, but I googled and found out it's actually a cool program you wrote! What a great way to make sense of data that wraps over numerous lines.

ADD REPLYlink written 6.7 years ago by SES8.2k
4
gravatar for Malachi Griffith
6.6 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith17k wrote:

Similarly using Amigo you can craft URLs within a script to pull this info for a list of GO IDs (or query any number of different ways). For example:

http://amigo.geneontology.org/cgi-bin/amigo/term-assoc.cgi?gptype=all&speciesdb=all&taxid=9606&evcode=all&term_assocs=all&term=GO:0007507&action=filter&format=rdfxml

That will return entries for GO:0007507, for human only (taxid=9606), all evidence code types allowed, all term associations allowed. Results are returned in xml format for convenient parsing. You can also download in the GO associations ('go_assoc') format.

ADD COMMENTlink written 6.6 years ago by Malachi Griffith17k
2
gravatar for enricoferrero
6.6 years ago by
enricoferrero760
United Kingdom
enricoferrero760 wrote:

In the end I just used FlyMine, which is handy because it's where I store my lists of genes anyway. In the list page it's just a matter of selecting the right fiilter (i.e. GO parent term: heart development).

Thanks for your suggestions anyway!

ADD COMMENTlink written 6.6 years ago by enricoferrero760
0
gravatar for EagleEye
16 months ago by
EagleEye6.2k
Sweden
EagleEye6.2k wrote:

Try GeneSCF enrichment analysis, it will provide you with all 1000 genes and list all associated GO terms irrespective of their statistical significance.

ADD COMMENTlink modified 16 months ago • written 16 months ago by EagleEye6.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1121 users visited in the last hour