Question: GO enrichment analysis using a Text file with all the genes and GO ids associated for a non model organism
0
gravatar for joynathan2
2.4 years ago by
joynathan220
joynathan220 wrote:

I have the mapping of every gene in my organism to its GO id. Some genes doesn't have a GO id also. Below is an example of the format I have.

SeqName    length    score    eValue    hitName    GOs    ACC
Anisa00001    1185    656.751    0.0    gi|498338734|ref|WP_010652890.1|MULTISPECIES: integrase [Legionellaceae]      GO:0015074,GO:0006310,GO:0003677    WP_010652890.1,YP_006506632.1,CCD06721.1,ETO94118.1
Anisa00001    1185    466.463    6.16098E-161    gi|493924933|ref|WP_006869768.1|integrase [Legionella drancourtii]      GO:0015074,GO:0006310,GO:0003677    WP_006869768.1,EHL32123.1
Anisa00001    1185    424.861    1.43813E-144    gi|502743862|ref|WP_012978846.1|integrase [Legionella longbeachae]      GO:0015074,GO:0006310,GO:0003677    WP_012978846.1,YP_003454139.1,CBJ10990.1
Anisa00001    1185    423.705    4.53406E-144    gi|499535659|ref|WP_011216442.1|integrase [Legionella pneumophila]      GO:0015074,GO:0006310,GO:0003677    WP_011216442.1,YP_127815.1,CAH16726.1
Anisa00001    1185    419.468    2.18018E-142    gi|570283998|gb|AHE66185.1|site-specific recombinase XerD [Legionella oakridgensis ATCC 33761 = DSM 21215]      GO:0015074,GO:0006310,GO:0003677    AHE66185.1,ETO93999.1
Anisa00001    1185    413.69    4.08724E-140    gi|499526807|ref|WP_011213447.1|integrase [Legionella pneumophila]      GO:0015074,GO:0006310,GO:0003677    WP_011213447.1,YP_123415.1,CAH12239.1,ERB41210.1,ERH42153.1,ERI47405.1
Anisa00001    1185    413.69    6.12215E-140    gi|504092896|ref|WP_014326890.1|integrase [Legionella pneumophila]      GO:0015074,GO:0006310,GO:0003677    WP_014326890.1,YP_005185414.1,AEW51315.1
Anisa00001    1185    409.068    2.69005E-138    gi|499533894|ref|WP_011215175.1|integrase [Legionella pneumophila]      GO:0015074,GO:0006310,GO:0003677    WP_011215175.1,YP_126420.1,YP_006508280.1,CAH15303.1,CCD08390.1,KGP62973.1
Anisa00001    1185    404.831    1.16647E-136    gi|698843809|emb|CEG57778.1|Phage integrase family site-specific recombinase [Legionella fallonii LLAP-10]     GO:0015074,GO:0006310,GO:0003677    CEG57778.1
Anisa00003    297    176.022    7.21027E-56    gi|498338732|ref|WP_010652888.1|MULTISPECIES: hypothetical protein [Legionellaceae]      GO:0043565,GO:0003677    WP_010652888.1,YP_006506634.1,CCD06723.1,ETO94116.1
Anisa00003    297    120.168    9.08058E-34    gi|698843811|emb|CEG57780.1|conserved protein of unknown function [Legionella fallonii LLAP-10]     GO:0043565,GO:0003677    CEG57780.1
Anisa00003    297    93.9745    3.23067E-23    gi|493924102|ref|WP_006868999.1|hypothetical protein [Legionella drancourtii]      GO:0043565,GO:0003677    WP_006868999.1,EHL32770.1
Anisa00003    297    80.1073    5.8726E-18    gi|447092960|ref|WP_001170216.1|hypothetical protein [Leptospira interrogans]      GO:0043565,GO:0003677    WP_001170216.1,EMM81225.1
Anisa00003    297    78.1814    4.04598E-17    gi|489065186|ref|WP_002975201.1|DNA-binding helix-turn-helix protein [Leptospira terpstrae]      GO:0043565,GO:0003677    WP_002975201.1,EMY59958.1
Anisa00003    297    77.7962    5.73396E-17    gi|505585864|ref|WP_015678427.1|DNA-binding helix-turn-helix protein [Leptospira yanagawae]      GO:0043565,GO:0003677    WP_015678427.1,EOQ87907.1
Anisa00003    297    76.2554    2.31193E-16    gi|523642128|ref|WP_020778299.1|DNA-binding helix-turn-helix protein [Leptospira meyeri]      GO:0043565,GO:0003677    WP_020778299.1,EMJ85365.1
Anisa00003    297    75.8702    3.27591E-16    gi|489067540|ref|WP_002977532.1|DNA-binding helix-turn-helix protein [Leptospira vanthielii]      GO:0043565,GO:0003677    WP_002977532.1,EMY71198.1
Anisa00003    297    75.485    3.86738E-16    gi|490606321|ref|WP_004471330.1|MULTISPECIES: Cro/C1-type HTH DNA-binding domain protein [Leptospira]      GO:0043565,GO:0003677    WP_004471330.1,EKO33151.1,EKO78480.1,EMI68067.1,EMN22335.1,EMO21363.1
Anisa00003    297    75.8702    5.5926E-16    gi|488857175|ref|WP_002769485.1|hypothetical protein [Leptonema illini]      GO:0043565,GO:0003677    WP_002769485.1,EHQ05131.1
Anisa00003    297    75.0998    6.57594E-16    gi|501452877|ref|WP_012476326.1|hypothetical protein [Leptospira biflexa]      GO:0043565,GO:0003677    WP_012476326.1,YP_001963262.1,ABZ94684.1
Anisa00003    297    73.559    1.84265E-15    gi|685200322|gb|AIN94425.1|hypothetical protein JO40_10230 [Treponema putidum]     GO:0003677    AIN94425.1
Anisa00004    1233    466.463    1.24221E-160    gi|502743845|ref|WP_012978829.1|outer membrane-specific lipoprotein transporter subunit ; membrane component of ABC superfamily [Legionella longbeachae]      GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886    WP_012978829.1,YP_003454078.1,CBJ10928.1
Anisa00004    1233    462.996    2.88424E-159    gi|698844250|emb|CEG58219.1|Lipoprotein-releasing system transmembrane protein LolC [Legionella fallonii LLAP-10]     GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886    CEG58219.1
Anisa00004    1233    459.529    6.69725E-158    gi|570285197|gb|AHE67384.1|lipoprotein releasing system, transmembrane protein, LolC/E family [Legionella oakridgensis ATCC 33761 = DSM 21215]      GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886    AHE67384.1,ETO93044.1
Anisa00004    1233    453.751    1.30772E-155    gi|504657357|ref|WP_014844459.1|outer membrane-specific lipoprotein ABC transporter permease [Legionella pneumophila]      GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886    WP_014844459.1,YP_006509488.1,CCD09630.1
Anisa00004    1233    452.981    2.6206E-155    gi|506459505|ref|WP_015961405.1|hypothetical protein [Legionella pneumophila]      GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886    WP_015961405.1,YP_124551.1,CAH13391.1,ERB41693.1,ERH43995.1,ERI49100.1
Anisa00004    1233    452.21    5.28713E-155    gi|499535375|ref|WP_011216187.1|hypothetical protein [Legionella pneumophila]      GO:0042954,GO:0042953,GO:0016021,GO:0016020,GO:0005886    WP_011216187.1,YP_127546.1,CAH16451.1

What I want to do is enrichment analysis for a subset of genes present in the above and see which GO terms are enriched in that subset of genes. Basically I am looking for Fisher's exact test or Hypergeometric Test. Is there any program that any one could suggest?

go • 1.3k views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by joynathan220
1

Do you have a text file or something that we can read, it is almost impossible to read the format here and it make things complicated

ADD REPLYlink written 2.4 years ago by Sam2.1k

Hi,

You can use GO enrichment tool at PantherDB.org. Please check their paper in Nature Protocols (http://www.nature.com/nprot/journal/v8/n8/full/nprot.2013.092.html) on how to prepare the input file etc.

R


 

ADD REPLYlink written 2.4 years ago by Rama40
3
gravatar for EagleEye
2.4 years ago by
EagleEye4.8k
Sweden
EagleEye4.8k wrote:

Four steps to go with GeneSCF: This will work with GeneSCF v1.0.

1) Tool:

http://genescf.kandurilab.org/downloads.php

Gene Set Clustering based on Functional annotation (GeneSCF)

2) Geneset database as textfile:

https://github.com/santhilalsubhash/geneSCF/tree/master/geneSCF-master-v1.0/annotation

After making your annotation (DB) file in flat file format like these files. Put your file in annotation folder

 

3) Code to edit:

https://github.com/santhilalsubhash/geneSCF/blob/master/geneSCF-master-v1.0/class/functional_class.pl

After line 103 add one more if condition for your database flat file: If you have only one type of GeneID, use only one IF condition.

if($ARGV[0] eq "gid" && $ARGV[3] eq "YOUR_DB_NAME")
{
$mytype="Entrez Gene ID";
open(IN1,"$ARGV[5]/annotation/YOUR_DB_FILE_withEntrezGeneID.txt") or die "Error opening in file";
}

if($ARGV[0] eq "sym" && $ARGV[3] eq "YOUR_DB_NAME")
{
$mytype="Gene Symbol";
open(IN1,"$ARGV[5]/annotation/YOUR_DB_FILE_withGeneSymbol.txt") or die "Error opening in file";
}

4) Now you can run GeneSCF 

geneSCF -i=./your_listOfGenes_withgeneSym -db=YOUR_DB_NAME -o=./test/output -t=sym

 

ADD COMMENTlink modified 8 months ago • written 2.4 years ago by EagleEye4.8k
1
gravatar for joynathan2
2.4 years ago by
joynathan220
joynathan220 wrote:

Thanks. I made it work

ADD COMMENTlink written 2.4 years ago by joynathan220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 827 users visited in the last hour