gene ontology with reference list
2
1
Entering edit mode
4.5 years ago
biosol ▴ 170

Hi all,

I would like to do an enrichment analysis/gene ontology analysis after an RNAseq differential expression analysis. For this, I would like to use a set of reference genes. I have already tried GO, but it's giving me some errors (it says that I have duplicate genes, while I've checked and I don't have them). Anyone could suggest a different tool?? Or, anyone that has an idea why I'm getting this error? Thanks a lot in advance!

RNA-Seq gene ontology • 1.7k views
ADD COMMENT
1
Entering edit mode

What is the output of (assuming a plain text file with one gene per row):

1) sort -k1,1 your.genes | wc -l 2) sort -k1,1 -u your.genes | wc -l

ADD REPLY
0
Entering edit mode

The output of both commands is the same: 17647 genes... Thank you anyway ;)

ADD REPLY
0
Entering edit mode

Which tool did you use and how?

ADD REPLY
0
Entering edit mode

I used webtool Gene Ontology (GO):http://geneontology.org/ I only pasted my list on the webpage and it redirects me to PANTHER, where I can add my reference gene list. At this point it tells me that I have duplicate genes in my reference list...

ADD REPLY
2
Entering edit mode

The error is most likely caused by your input IDs or names being converted to the protein accession numbers used by Panther. This means that several of your inputs correspond/are mapped to the same protein.

ADD REPLY
0
Entering edit mode
4.5 years ago
lihe.liu ▴ 30

Another way could be:

  • Download all the GO information (ID, Name, and genes involved) from a certain database (e.g. Ensembl or org.xx.eg.db)
  • Extract every GO category and the genes involved.
  • Run a loop, for each GO, compose a 2 by 2 contingency table and get corresponding p-value using Fisher's exact test.

You have to program in R language.

ADD COMMENT
0
Entering edit mode

While you can certainly do this, this is not very efficient either computationally or statistically. This is because this doesn't account for the graph structure of the Gene Ontology. Also you'll need to perform correction for multiple testing. Since you mention R, check out the various packages that are already available for this purpose such as topGO, clusterProfiler, goseq...

ADD REPLY
0
Entering edit mode
4.5 years ago
O'kin-1 ▴ 20

Another good tool might be https://string-db.org. Make sure to log in order to upload your background/reference list. You can also install it as an R package and run your analysis if you are comfortable with that option.

Goodluck.

ADD COMMENT

Login before adding your answer.

Traffic: 2580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6