GeneSCF gives out more pathways for genes compared to DAVID
1
0
Entering edit mode
7.8 years ago
a.james ▴ 240

Hello All,

I am using geneSCF on a set Differentially Expressed genes with KEGG database, and it's giving me more significantly enriched pathways than DAVID tool on the same set of genes.

Also, I noticed its showing more genes as Enriched for each pathway in addition to the input set of DE genes. I am using Gene symbols as input set.

Would be great if someone has a suggestion about this.

Thank you

RNA-Seq ChIP-Seq GeneSCF • 3.5k views
ADD COMMENT
0
Entering edit mode

Do you get genes which are not present in the input list shows up in your enriched terms? What was your input Entrez IDs or Gene Symbols !!!

ADD REPLY
0
Entering edit mode

Yes, I saw 2 additional genes for a specific pathway. And my input is Gene Symbols

ADD REPLY
0
Entering edit mode

First of all I want to clarify that GeneSCF converts from Gene symbol to entrez IDs for matching source database (especially for KEGG) in more efficient way. Single gene symbol can match multiple entrez id, in that case you will end up in getting one or two extra genes.

Also explained in these posts,

http://bioinfoblog.it/tag/gene-conversion/

"When you convert symbols to id, it is important to remember that not only the same gene can have more than one symbol, but also the same symbol can match multiple entrez ids."

Why does biomart return multiple Entrez IDs

ADD REPLY
0
Entering edit mode

Ok, I got it. Could you please see the first part of the question.

When I do a functional pathway analysis with GeneSCF it shows up 33 Significant pathways (P_val<=0.05) , whereas with DAVID on KEGG it showed only 1 pathway. It's the same with STRING

ADD REPLY
1
Entering edit mode

1) In DAVID for p-value, please check whether you have selected options box to Fisher's exact test. Because default DAVID p-value is based on EASE, modified Fisher's test (https://david.ncifcrf.gov/helps/functional_annotation.html#fisher).

2) Also check the number of background genes used by DAVID and the number you gave for GeneSCF as background.

The above two factors can influence your results drastically.

3) Please also make sure that DAVID has the recent release of updated KEGG pathway list (Since GeneSCF uses current release),

http://biorxiv.org/content/biorxiv/early/2016/04/19/049288.full.pdf (page 3)

DAVID release note: It was not clear on KEGG update

https://david-d.ncifcrf.gov/content.jsp?file=release.html

https://david.ncifcrf.gov/content.jsp?file=update.html

KEGG release note:

http://www.kegg.jp/kegg/docs/relnote.html

ADD REPLY
1
Entering edit mode

Ok got ..Thank you so much, I will check those points

ADD REPLY
0
Entering edit mode

Also I have a question, what if I have to view the pathways in Cytoscope it's also resulting with no pathway

ADD REPLY
0
Entering edit mode

Which plugin are you using in Cytoscape?

ADD REPLY
0
Entering edit mode

ClueGo is the plugin I am using. I am not sure what can be the reference set in this case I have to give.Because with the deafult setting on KEGG pathway analysis its not showing any pathways.

ADD REPLY
1
Entering edit mode

http://www.ici.upmc.fr/cluego/ClueGODocumentation.pdf (Page 17)

"The PValue is calculated with Fisher Exact Test. Several methods for PValue correction are proposed: Bonferroni, Bonferoni step-down and Benjamini-Hochberg. We consider as reference the total number of the genes associated with all the terms included in the ontology source used."

Form the above statement I can conclude that ClueGO uses total number of genes associated with all terms/pathways form the source database. For KEGG pathways (up to Release 78.1) there are ~6980 genes associated with 301 pathways (Human). You might be using bigger numbers for GeneSCF background genes, that's the reason GeneSCF picks up more pathways as enriched than ClueGO.

(Tip: You can try reducing background number of genes in GeneSCF close to above mentioned number for KEGG, this might give you some idea.)

If you would like to visualize only the pathways that are enriched by GeneSCF on Cytoscape, play with the filters on ClueGO,

enter image description here

1) Use only the genes associated with the enriched terms on Cytoscape (predefined)

2) Use predefined custom pathways

3) Do not use any filters on Cytoscape

4) Use the network feature.

Soon there will be integration of this network visualization feature on GeneSCF (probably have to wait for long. Still the tool will be command line !!!).

ADD REPLY
0
Entering edit mode

Thank you so much for detailed description. We are aiming to plot those pathways which are the resulted from GeneSCF using ClueGO. So by giving those genes which are enriched in pathways in separate files would do this isn't ? And also I have used the default number of genes as background in GeneSCF (30,000) and in Kegg pathways its way less that than that.Please let me know why is it so.

ADD REPLY
1
Entering edit mode
7.8 years ago
a.james ▴ 240

Thank you so much for detailed description. We are aiming to plot those pathways which are the resulted from GeneSCF using ClueGO. So by giving those genes which are enriched in pathways in separate files would do this isn't ? And also I have used the default number of genes as background in GeneSCF (30,000) and in Kegg pathways its way less that than that.Please let me know why is it so.

ADD COMMENT
1
Entering edit mode

1) For network visualization:

a) yes provide the list of genes from your enriched terms as input and run clueGO without any filtering.

b) You will get network with the terms which are enriched from GeneSCF along with other terms too. You have to select the terms you want from the network (Nodes, SHIFT for multiple selection) and from menu Select -> Select adjacent edges, select -> nodes -> nodes connected by selected edges (Genes), File -> New -> Network -> From selected nodes, selected edges.

c) Now you constructed new network from selected nodes and edges. Play with different "Layout".

I hope this is clear (Hard to explain without video, but tried!!).

I found some what similar tutorial for constructing network for selected terms, follow from time 0.47

https://youtu.be/KIJ6M1nvKoY?list=PLA8676BE318D53B40&t=45


2) Which version of GeneSCF are you using ? It looks like you are using V1.0 !!! If so, please adjust your background genes according to your analysis (It can be total number of genes expressed irrespective of its significance from your experiment / Total number of protein coding genes from your reference annotation like Gencode, ensembl etc.,).

Tip: You can also try using GeneSCF v1.1, uses updated databases for enrichment analysis. Prefer v1.0 only if you have custom database to do enrichment analysis. Also consider filtering the enriched terms using p-value + num_of_genes or percentage% (decide cutoff based on your input number of genes).

enter image description here

ADD REPLY
1
Entering edit mode
  1. Thank you so much its very clear ..

2: Which version of GeneSCF are you using ? It looks like you are using V1.0 !!! , Yes you are right.

I have tried the both versions..And with the old version v1.0 has considerable overlap with pathways from other platform for comaprision study. This is the reason to stick with older version

Ok I can try altering the background number, but I am afarid I will lose the overlap I have from other platform

ADD REPLY
1
Entering edit mode

If you want to maintain the consistency with your old analysis, you can continue using v1.0.

ADD REPLY

Login before adding your answer.

Traffic: 2283 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6