Question

gene ontology with reference list

1

Entering edit mode

4.5 years ago

biosol ▴ 170

Hi all,

I would like to do an enrichment analysis/gene ontology analysis after an RNAseq differential expression analysis. For this, I would like to use a set of reference genes. I have already tried GO, but it's giving me some errors (it says that I have duplicate genes, while I've checked and I don't have them). Anyone could suggest a different tool?? Or, anyone that has an idea why I'm getting this error? Thanks a lot in advance!

RNA-Seq gene ontology • 1.7k views

ADD COMMENT • link updated 4.5 years ago by O'kin-1 ▴ 20 • written 4.5 years ago by biosol ▴ 170

1

Entering edit mode

What is the output of (assuming a plain text file with one gene per row):

1) sort -k1,1 your.genes | wc -l 2) sort -k1,1 -u your.genes | wc -l

ADD REPLY • link 4.5 years ago by ATpoint 82k

0

Entering edit mode

The output of both commands is the same: 17647 genes... Thank you anyway ;)

ADD REPLY • link 4.5 years ago by biosol ▴ 170

0

Entering edit mode

Which tool did you use and how?

ADD REPLY • link 4.5 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

I used webtool Gene Ontology (GO):http://geneontology.org/ I only pasted my list on the webpage and it redirects me to PANTHER, where I can add my reference gene list. At this point it tells me that I have duplicate genes in my reference list...

ADD REPLY • link 4.5 years ago by biosol ▴ 170

2

Entering edit mode

The error is most likely caused by your input IDs or names being converted to the protein accession numbers used by Panther. This means that several of your inputs correspond/are mapped to the same protein.

ADD REPLY • link 4.5 years ago by Jean-Karim Heriche 27k

score 0 · Answer 1 · 2019-10-16

0

Entering edit mode

4.5 years ago

lihe.liu ▴ 30

Another way could be:

Download all the GO information (ID, Name, and genes involved) from a certain database (e.g. Ensembl or org.xx.eg.db)
Extract every GO category and the genes involved.
Run a loop, for each GO, compose a 2 by 2 contingency table and get corresponding p-value using Fisher's exact test.

You have to program in R language.

ADD COMMENT • link 4.5 years ago by lihe.liu ▴ 30

0

Entering edit mode

While you can certainly do this, this is not very efficient either computationally or statistically. This is because this doesn't account for the graph structure of the Gene Ontology. Also you'll need to perform correction for multiple testing. Since you mention R, check out the various packages that are already available for this purpose such as topGO, clusterProfiler, goseq...

ADD REPLY • link 4.5 years ago by Jean-Karim Heriche 27k

score 0 · Answer 2 · 2019-10-29

0

Entering edit mode

4.5 years ago

O'kin-1 ▴ 20

Another good tool might be https://string-db.org. Make sure to log in order to upload your background/reference list. You can also install it as an R package and run your analysis if you are comfortable with that option.

Goodluck.

ADD COMMENT • link 4.5 years ago by O'kin-1 ▴ 20