I've done an extensive search on Google as well as the forums on Seqanswers and Biostars and don't believe this question has been covered (in this depth) before.
I've used WebMGA (http://weizhong-lab.ucsd.edu/metagenomic-analysis/server/kog/) for functional annotation (KOG) of my transcriptome (FASTA) file. However, I need some help interpreting/integrating these results.
Briefly, I did a de novo assembly and annotated it using dammit. I used this as input for WebMGA and it outputs a number of files. With the help of a previous Biostars post (which links to this stackoverflow post) I was able to use one of the output files (output.2.class: counts by class) to make a histogram of KOG classes from my assembly.
However, I would also like to append the KOG results (i think from output.2: long table of rpsblast hits?) onto my transcriptome. So in addition to gene names that were extracted from UniRef90 (and appended) to my transcriptome from dammit I want KOG information as well.
I understand I will likely need to write a custom script to this; however, I have very little scripting experience. For example, I was fairly easily able to make changes to the stackoverflow R code because I've done most of my programming using R. There must be a simple way to do this on the linux command line? If someone has experience attaching functional annotations (KOG or even GO) to assemblies I would really appreciate your support and tutelage. Similarly, if a code exists for a similar (but not identical task) I would be willing to try and cannibalizing/re-purpose the script.
A frustrated but motivated student
P.S. To show you that I have been thinking about how to go about this let me describe what I want (albeit in pseudo-code)
1) Look at FASTA file and KOG output (in the first column $1) and when sequences match
2) extract subset of data from KOG output (description column $11) with
awk and append it to the end of that header in the FASTA file