Gene Ontology Sql For All Descendants
1
0
Entering edit mode
10.7 years ago
arieszter • 0

Dear GO SQL Experts,

I would like to create a GO annotation text file which contains the set of genes connected to each GOids and its all descendants (children, grandchildren, etc.).

Can somebody give me some hints how to do this? (I am working with D.melanogaster and FlyBase geneIDs - FBgn)

I already created an SQl query for this:

SELECT

term.acc AS superterm_acc,

dbxref.xref_key AS gp_acc,

evidence.code AS code

FROM term

INNER JOIN graph_path ON term.id=graph_path.term1_id)

INNER JOIN association ON (graph_path.term2_id=association.term_id)

INNER JOIN evidence ON (association.term_id=evidence.id)

INNER JOIN gene_product ON (association.gene_product_id=gene_product.id)

INNER JOIN species ON (gene_product.species_id=species.id)

INNER JOIN dbxref ON (gene_product.dbxref_id=dbxref.id)

WHERE

species.genus = 'Drosophila'

AND

species = 'melanogaster';

But I am not sure if the query is correct since the evidence codes that I get are seems to be imperfect: bash$ awk '{print $3}' downloaded_file | sort | uniq -c

6387 IDA, 921948 IEA, 2911 IEP, 113 IGC, 257 IMP, 123 ISA, 13 ISO, 664 ISS, 904 NAS, 583144 ND, 24598 RCA

accordnig to this query I got more kinds of evidence codes:

SELECT evidence.code, count(DISTINCT association.id)

FROM gene_product

INNER JOIN dbxref ON (gene_product.dbxref_id=dbxref.id)

INNER JOIN association ON gene_product.id=association.gene_product_id)

INNER JOIN species ON (gene_product.species_id=species.id)

INNER JOIN evidence ON association.id=evidence.association_id)

WHERE dbxref.xref_dbname = 'FB'

AND

species.genus = 'Drosophila'

AND

species = 'melanogaster'

GROUP BY evidence.code;

IBA 1254, IC 1130, IDA 11801, IEA 13541, IEP 974, IGC 18, IGI 2016, IKR 63, IMP 18364, IPI 1889, IRD 9, ISA 141, ISM 1301, ISO 1, ISS 14862, NAS 10277, ND 6899, RCA 19, TAS 4732

Cheers, Eszter Ari Institut für Populationsgenetik Vetmeduni Vienna Veterinärplatz 1 1210 Wien, Austria

go • 2.4k views
ADD COMMENT
1
Entering edit mode
10.1 years ago
cmungall ▴ 30

I think your first query isalmost correct (note that it gives you these genes plus the direct annotation term and its ancestors, not descendants. This is what you want).

One problem is that you don't have a DISTINCT clause, which will lead to overcounting when you pass things through the shell command.

You should expect to get a lot of rows this way - are you querying via a local MySQL database? Do you have any limits set?

The # of RCAs in the first query is unusual, as FlyBase does not have many RCAs. I will investigate

The second query doesn't use the graph_path table, so you'll only get direct annotations.

ADD COMMENT

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6