Question: gene ontology with reference list
1
gravatar for sonia.olaechea
8 months ago by
sonia.olaechea130 wrote:

Hi all,

I would like to do an enrichment analysis/gene ontology analysis after an RNAseq differential expression analysis. For this, I would like to use a set of reference genes. I have already tried GO, but it's giving me some errors (it says that I have duplicate genes, while I've checked and I don't have them). Anyone could suggest a different tool?? Or, anyone that has an idea why I'm getting this error? Thanks a lot in advance!

rna-seq gene ontology • 360 views
ADD COMMENTlink modified 8 months ago by O'kin-110 • written 8 months ago by sonia.olaechea130
1

What is the output of (assuming a plain text file with one gene per row):

1) sort -k1,1 your.genes | wc -l 2) sort -k1,1 -u your.genes | wc -l

ADD REPLYlink written 8 months ago by ATpoint36k

The output of both commands is the same: 17647 genes... Thank you anyway ;)

ADD REPLYlink written 8 months ago by sonia.olaechea130

Which tool did you use and how?

ADD REPLYlink written 8 months ago by Jean-Karim Heriche22k

I used webtool Gene Ontology (GO):http://geneontology.org/ I only pasted my list on the webpage and it redirects me to PANTHER, where I can add my reference gene list. At this point it tells me that I have duplicate genes in my reference list...

ADD REPLYlink written 8 months ago by sonia.olaechea130
2

The error is most likely caused by your input IDs or names being converted to the protein accession numbers used by Panther. This means that several of your inputs correspond/are mapped to the same protein.

ADD REPLYlink written 8 months ago by Jean-Karim Heriche22k
0
gravatar for lihe.liu
8 months ago by
lihe.liu20
lihe.liu20 wrote:

Another way could be:

  • Download all the GO information (ID, Name, and genes involved) from a certain database (e.g. Ensembl or org.xx.eg.db)
  • Extract every GO category and the genes involved.
  • Run a loop, for each GO, compose a 2 by 2 contingency table and get corresponding p-value using Fisher's exact test.

You have to program in R language.

ADD COMMENTlink written 8 months ago by lihe.liu20

While you can certainly do this, this is not very efficient either computationally or statistically. This is because this doesn't account for the graph structure of the Gene Ontology. Also you'll need to perform correction for multiple testing. Since you mention R, check out the various packages that are already available for this purpose such as topGO, clusterProfiler, goseq...

ADD REPLYlink written 8 months ago by Jean-Karim Heriche22k
0
gravatar for O'kin-1
8 months ago by
O'kin-110
O'kin-110 wrote:

Another good tool might be https://string-db.org. Make sure to log in order to upload your background/reference list. You can also install it as an R package and run your analysis if you are comfortable with that option.

Goodluck.

ADD COMMENTlink written 8 months ago by O'kin-110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 844 users visited in the last hour