Advice needed for GO enrichment
1
1
Entering edit mode
9.5 years ago
dankwc2000 ▴ 20

Hello,

I want to use GO enrichment (http://geneontology.org/) for the first time with my E.coli proteomic hits. My IDs are from Uniprot and I have 211 of these. After submitting my 211 IDs, only 4 are mapped.

The 4 IDs that mapped successfully are:

P39356
P0A8A0
P0AAB8
P0AAI3

Here is a sample list of IDs that didn't match:

P0A954
A0A061KCV1
P0AGK6
A0A066SYL5

I have also tried to convert the Uniprot IDs to Uniprot Gene names and then searching it on GO, which has given me an improved number of mapped IDs to 129 out of 211.

A sample of hits which was mapped successfully includes:

ampC
accA
yeaG
atpF

Here are a sample of Gene names which was not mapped successfully:

bla
groL1
hlyA
AC789_1c11180
BY96_12590

If someone can shed some light on why it is not mapping successfully that will be great. Thanks.

The list of IDs I want submitting are located here

A0A0E2TM38
P0AED2
A0A061L549
A0A066SXV7
P0A6B3
A0A024L7L1
A0A066T8X6
B7LIW9
A0A025FZ31
A0A0A1A6C7
A0A066T289
A0A023Z5Q1
P0A4U5
A0A0A0GRX6
A0A066T2U8
A0A066T2K7
B7MI03
A0A061L2L2
B7NCN8
A0A066SYL5
A0A0D8VW66
A7ZSC7
A0A0A1AAF9
P0AAB8
A0A061YGI3
A0A0E0V5W3
A0A061YFX7
A0A026GYM0
C8CGJ0
A0A023L2I8
A0A028AHG2
B7UIL1
A0A0B1F2D6
A0A066T1C1
P0AB82
C3TIN2
B7MV94
A0A0D8W5Y7
A0A066SU35
A0A066SPP0
A0A0A1A0U0
A0A061YKW8
A0A066R9J0
E2XHV6
B7MKM9
A0A027TKW3
P0AAI3
P39356
B7UI43
C6EFG9
A0A028ED46
A0A0E0V778
A0A061YA81
A0A0C8R7N1
A0A066T0N8
A0A0A2RRQ4
A0A066RHH7
A0A061L049
A0A0C5EZJ5
A0A061L5F7
A0A023YSY5
P0A954
P0DMC8
A0A0A0FY80
A0A066SZS3
A0A061YP03
B1LJ43
A0A0A1AAR5
A0A066T149
A0A061YFJ3
A0A0A6RYT3
A7ZU66
A0A0E2TRU3
A0A061KX09
B1VCI2
A0A061Y7T9
A7ZUL0
A7ZTU4
A0A0E2LNZ8
K4XHA3
B7MC90
C3TJ62
A7ZTU8
A0A061KQ46
A0A066SN12
P0AAI7
C3SJ47
A0A024L4V5
A0A024L616
H9URL8
R6VNF8
A0A066SZY4
A7ZTJ2
P0ABC5
A0A0E1LC25
A0A061KAB2
A0A061KI80
B3HJ98
A0A066T1G8
A0A0D6IKJ0
A0A077Z3W0
A8AQJ0
A0A061KCV1
B1P7H4
A1AJ51
E2QLY1
A0A066RGX5
A0A0E1SWP7
Q1RFA5
A0A061L7G7
H4J4S0
A7ZTU3
C7S9T0
A7ZPD1
A0A0E2TRQ1
A0A0E2TMC9
A0A037YGF3
E2QGQ2
Q0TKK5
A0A0E0U2X6
A0A0A1ADZ6
B7N0Y3
B7N2J0
P13661
A0A066SWG2
A0A0E0TX22
A0A0B0W2Q7
A0A061KDP5
A0A061YDR4
A0A027U8D8
A0A028AFZ4
B1LJ51
Q5MAJ8
B7N5S1
A0A066SWC5
P0ACJ2
E2QFX6
A0A066SSX9
A0A066T686
A0A024L8V5
P0A4L6
A0A0B0XUI7
P0AEU9
P0AFL5
B7MQ57
A7ZK01
W9AM67
A0A066QLF8
A0A0E1LA67
A0A0B0VCE8
A0A0E1LDD5
A0A0A1A6P7
B7MQR5
C3TF32
Q1R2T5
Q8VR39
A0A061L3B1
A0A0D6IRY9
A0A0E0VCW7
A1AB32
H9UQ82
R6U580
A0A066SNK8
B7UIS4
C3T5A2
A0A027U015
E2XC85
A0A061YL54
A0A061YKR0
A0A061KHZ9
A0A066SVB9
A0A066SS32
A0A0E1SXW8
A0A061Y578
A0A0E1T6X3
A0A061KA72
A0A066T4W3
A0A023LIJ5
A0A066SS33
P33219
P0AE93
E2QQC2
A0A061KE35
P0AGK6
E2QIN3
A0A066T755
B7MX29
A0A075L5G2
A0A027ZL67
A0A061YI78
A7ZS64
A7ZHR1
A0A0E1LIN1
A0A025FQJ4
A0A066SXJ0
A0A0F3SJZ5
A0A061YC46
A0A061YEH2
P0A9Z3
A7ZTR0
A0A061YH54
A7ZHS5
B7N7L4
B1LIL4
A0A061KXL2
A0A066RDY2
J7Q7B1
A0A0A6VHK7
A0A027TTG1
P0A8A0
A8ARN6
view raw gistfile1.txt hosted with ❤ by GitHub

Proteomics • 2.3k views
ADD COMMENT
3
Entering edit mode

Please move the IDs to a GitHub gist. Pasting a list here only serves to drive people away from the question - it is excessive information not necessary to the question.

A better approach would be to provide 2 lists - the IDs that map, and a small sample (maybe 5 items) among the IDs that don't map. That way, you can think on attributes common among members of each group and different between the groups themselves and figure out the root cause.

ADD REPLY
1
Entering edit mode

Hi,

there is a lot different databases serving the GO enrichment, like DAVID, Panther, GraphiteWeb, etc, so you can try there. I would also suggest to convert your list from proteins to gene symbols (Entrez) which encode those proteins.

From my experience I can also tell you that sometimes you have very long list of valid IDs, but there is too small statistics to map something correctly.

Best regards!

ADD REPLY
2
Entering edit mode
9.5 years ago
dago ★ 2.8k

If the genome of your E.coli strain is publicly available you can get the GO annotation from QuickGO. Basically download the annotation of your stain and then extract the genes in your list.

Otherwise you could try to annotate your genes with Blast2GO or run them with Interproscan (there is also a stand alone version for the last one).

For making the enrichment there are many nice tools. I personally like the one in Bioconductor R, the most (e.g. topGO).

You can get an idea checking previous posts:

ADD COMMENT

Login before adding your answer.

Traffic: 3263 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6