Question: GO annotations with or without gene isoforms
0
gravatar for Baylie_321
2.6 years ago by
Baylie_32130
London
Baylie_32130 wrote:

Hiya,

I have a blast database for GO terms in blast2go which includes IDs for all the isoforms of the genes. When I make my gene lists for GO enrichment analysis, ithe list compiler pulls the IDs of all the isoforms associated with the genes of interest (DEG). My question is : Should I

(a) De-duplicate the list so just one ID per gene is input into the GO enrichment analysis

or

(b) Submit the full list containing the IDs of all the isoforms for each gene of interest?

I have run both and the de-duplicated list as I anticipated contains less GO terms than the full list containing all the isoforms.

I feel like it is correct to run the full list of IDs (option b) because otherwise the enrichment test could be negatively biased by terms where there are lots of isoforms present in the database, but only one is submitted - making it look like the GO term is less enriched than it actually is (I hope that makes sense).

Best wishes and any opinions/advice are greatly appreciated,

Rebekah

rna-seq go terms • 640 views
ADD COMMENTlink modified 2.5 years ago by Biostar ♦♦ 20 • written 2.6 years ago by Baylie_32130

the de-duplicated list as I anticipated contains less GO terms

less in what sense? as in total number (would indeed not be surprising) or also less in content (== there are terms that are in the isoform set but not in the de-duplicated one)?

ADD REPLYlink written 2.6 years ago by lieven.sterck9.5k

i'm not sure what you mean, are they not the same thing? less total GO terms = less content?

group 1 : 213 terms vs 832 group 2: 354 terms vs 809 group 3: 575 terms vs 1052 group 4: 18 terms vs 83

are found enriched (de-duplicated vs full)

ADD REPLYlink written 2.6 years ago by Baylie_32130

Transcript-Level Versus Gene-Level Go Enrichment Analysis (For Non-Model Organism)

Based on the answer here, as the blast database contains all isoforms I feel that collapsing my input list to just one isoform per gene would cause bias in the enrichment and so the full list should be kept.

ADD REPLYlink written 2.6 years ago by Baylie_32130

I mean you will likely get the same GO-terms for all isoforms (or at least large overlap). So if you count the total amount it will indeed reduce when leaving out the isoforms. It might pay off to check if there are isoform specific terms (== that are present for one isoform but not for the others)

ADD REPLYlink written 2.6 years ago by lieven.sterck9.5k

Go annotation only gives you a general picture of your gene list, not further; unlikely, you find the isoform-specific terms.

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by seta1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2368 users visited in the last hour
_