Hi, I was doing some associations using gene abundance data from humann2 output, and got top results with hundreds of gene entries. By mapping them back to uniprot, I get the information on which species it come from and proteins they encode. Is there a way to clump all gene entries that encode the same proteins together? --for better annotation of the result.
A example of 4 gene entries from different organisms all encode for HTH cro/C1-type domain-containing protein. Thanks!
- R5DNF1 Parabacteroides johnsonii CAG:246
- R6K7I4 Eubacterium sp. CAG:252
- H1CLY0 Lachnospiraceae bacterium 7_1_58FAA
- D4C9T9 Clostridium sp. M62/1
Hi Thanks! Just that the protein names are usually not exactly the same, so it's a bit hard to do it for all proteins...