Hey all,
I am using a metaproteomic database for the gut microbiome of mice which I found online http://gigadb.org/dataset/view/id/100114/token/mZlMYJIF04LshpgP.
Unfortunately, the accession numbers and protein descriptions are not really helpful for taxonomic analyses since they are like this:
S-Fe7_GL0014216 [gene] locus=scaffold66956_1:1:1053:+ [Lack both ends] codon-table.11
There is no pattern in the accession in terms of taxonomy and the database is too big for my excel.
The owners also included a text file with explanations of each accessions e.g.
S-Fe7_GL0014216 1/1 Clostridiales order root|cellular organisms|Bacteria|Firmicutes|Clostridia|Clostridiales no rank|no rank|superkingdom|phylum|class|order
I am wondering if there is joint command or script to loop through the file and replace the matching accession with the actual species description?
Thanks in advance for your help!
Cheers
Time to switch to R/Python :-)