Question

GO annotations with or without gene isoforms

0

Entering edit mode

5.9 years ago

Baylie_321 ▴ 30

Hiya,

I have a blast database for GO terms in blast2go which includes IDs for all the isoforms of the genes. When I make my gene lists for GO enrichment analysis, ithe list compiler pulls the IDs of all the isoforms associated with the genes of interest (DEG). My question is : Should I

(a) De-duplicate the list so just one ID per gene is input into the GO enrichment analysis

or

(b) Submit the full list containing the IDs of all the isoforms for each gene of interest?

I have run both and the de-duplicated list as I anticipated contains less GO terms than the full list containing all the isoforms.

I feel like it is correct to run the full list of IDs (option b) because otherwise the enrichment test could be negatively biased by terms where there are lots of isoforms present in the database, but only one is submitted - making it look like the GO term is less enriched than it actually is (I hope that makes sense).

Best wishes and any opinions/advice are greatly appreciated,

Rebekah

RNA-Seq GO terms • 1.3k views

ADD COMMENT • link updated 5.7 years ago by Biostar 20 • written 5.9 years ago by Baylie_321 ▴ 30

0

Entering edit mode

the de-duplicated list as I anticipated contains less GO terms

less in what sense? as in total number (would indeed not be surprising) or also less in content (== there are terms that are in the isoform set but not in the de-duplicated one)?

ADD REPLY • link 5.9 years ago by lieven.sterck 15k

0

Entering edit mode

i'm not sure what you mean, are they not the same thing? less total GO terms = less content?

group 1 : 213 terms vs 832 group 2: 354 terms vs 809 group 3: 575 terms vs 1052 group 4: 18 terms vs 83

are found enriched (de-duplicated vs full)

ADD REPLY • link 5.9 years ago by Baylie_321 ▴ 30

0

Entering edit mode

Transcript-Level Versus Gene-Level Go Enrichment Analysis (For Non-Model Organism)

Based on the answer here, as the blast database contains all isoforms I feel that collapsing my input list to just one isoform per gene would cause bias in the enrichment and so the full list should be kept.

ADD REPLY • link 5.9 years ago by Baylie_321 ▴ 30

0

Entering edit mode

I mean you will likely get the same GO-terms for all isoforms (or at least large overlap). So if you count the total amount it will indeed reduce when leaving out the isoforms. It might pay off to check if there are isoform specific terms (== that are present for one isoform but not for the others)

ADD REPLY • link 5.9 years ago by lieven.sterck 15k

0

Entering edit mode

Go annotation only gives you a general picture of your gene list, not further; unlikely, you find the isoform-specific terms.

ADD REPLY • link 5.7 years ago by seta ★ 1.9k