Question

GO enrichment analysis

0

Entering edit mode

5.9 years ago

mxlsherry1992 ▴ 80

Dear all,

I want to do a GO enrichment analysis for some of genes, I used the KOBAS to do that, and here is part of the output file1:

GO:0022839  5.51E-55    1.69E-51    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0005254  1.59E-48    2.43E-45    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0015108  3.50E-47    3.58E-44    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:1902476  4.69E-47    3.60E-44    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0005253  6.27E-47    3.84E-44    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0006821  9.85E-46    5.03E-43    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0098661  7.54E-45    3.30E-42    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0004222  1.23E-44    4.71E-42    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0015103  5.80E-43    1.98E-40    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0015698  2.57E-41    7.89E-39    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0008237  1.13E-38    3.14E-36    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0098656  7.71E-38    1.97E-35    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0008509  1.18E-37    2.78E-35    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0022836  1.65E-34    3.61E-32    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0071456  8.21E-34    1.68E-31    TRINITY_DN108332_c0_g2_i1   TRINITY_DN130760_c2_g1_i1   TRINITY_DN30116_c0_g1_i1
GO:0036294  3.13E-33    6.01E-31    TRINITY_DN108332_c0_g2_i1   TRINITY_DN130760_c2_g1_i1   TRINITY_DN30116_c0_g1_i1
GO:0071453  1.12E-32    2.02E-30    TRINITY_DN108332_c0_g2_i1   TRINITY_DN130760_c2_g1_i1   TRINITY_DN30116_c0_g1_i1
GO:0006820  2.09E-31    3.56E-29    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0005216  3.49E-31    5.63E-29    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0022838  8.25E-31    1.26E-28    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0005902  1.83E-30    2.67E-28    TRINITY_DN304046_c0_g1_i1   TRINITY_DN62073_c0_g1_i1    TRINITY_DN102311_c6_g5_i1
GO:0015267  6.41E-30    8.93E-28    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1
GO:0022803  6.85E-30    9.13E-28    TRINITY_DN54568_c0_g1_i1    TRINITY_DN130760_c2_g2_i8   TRINITY_DN130760_c2_g1_i1

If I am not misunderstanding, I should focus on the correrrected P-value < 0.05 (third column), right? And another question is that if I want to visualize it, I find a website "WEGO", but the input file should be the like this: the first column is gene name/id, then followed its corresponding GO number, the format of this input file is different for our file1. So it you know how could I covert file1 to like the input file?

demo2000051 GO:0006470  GO:0008138  
demo2000063 
demo2000213 GO:0016706  GO:0019538  
demo2000262 
demo2000391 
demo2000401 GO:0008152  GO:0016787  
demo2000411 
demo2000672 GO:0005509  
demo2000691 GO:0005179  GO:0005576  
demo2001071 GO:0000166  
demo2001091 GO:0005509  
demo2001111 
demo2001131 
demo2001201 
demo2001431 
demo2001601 GO:0015031  GO:0015450  GO:0016020  
demo2001612

RNA-Seq • 1.5k views

ADD COMMENT • link 5.9 years ago by mxlsherry1992 ▴ 80

0

Entering edit mode

The typical way of visualizing GSEA is a volcano-like plot, enrichment score (x axis) versus -log10(FDR, or p-adjs-value, y axis). I haven't used wego but if it has calculated these two values you can make your own plot.

ADD REPLY • link 5.9 years ago by Buffo ★ 2.4k

0

Entering edit mode

Hi, Thans for reply..But if you know how could I covert file1 to the second format,, there is too much genes so it is impossible for me to do it one by one :(

ADD REPLY • link 5.9 years ago by mxlsherry1992 ▴ 80

1

Entering edit mode

It looks like each row is a gene, and further columns are the GO ids related to the gene, so you can try with python making a dictionary, the first column is the keys and you have to append further columns as a list of values. Finally, print the dictionary by key and values delimitated by spaces. I recommend you to use the webgestalt`s online page for GSEA analysis, it is fast and very easy to use.

ADD REPLY • link 5.9 years ago by Buffo ★ 2.4k