Dear all,
We have got RNA-Seq of a plant without a reference genome, so we de novo assembled its transcriptome using Trinity. Now we have an annotation file of this transcriptome. I want to generate GO (Gene Ontology) functional classification using Blast2GO. But re-mapping my Blast XML results in Blast2GO is time consuming (I tried that of my ~200,000 transcripts in my transcriptome).
An alternative way to generate GO function classification is importing the annotation file (with a suffix .annot
according to Blast2GO manual), the .annot
file looks like below (You can also get this example .annot file at http://www.blast2go.com/b2glaunch/resources , named b2g_example_files.zip): p.s. If you can't see this picture from Flickr, I also put it on GitHub
But our annotation file was not generated by Blast2GO, it's a CSV file like this below (a little mess):
X01_query_id, X06_hit_title, X07_molecular_function, X08_biological_process, X09_cellular_component
##header of the CSV file
comp1000113_c0_seq1,
Cc-nbs resistance protein [Medicago truncatula],
GO:0043531 // ADP binding;GO:0005524 // ATP binding;GO:0017111 // nucleoside-triphosphatase activity,
GO:0006952 // defense response,
comp10001_c0_seq1,
Pistil-specific extensin-like protein [Medicago truncatula],
,
,
comp1000255_c0_seq1,
F-box protein [Medicago truncatula],
,
,
comp1000736_c0_seq1,
Alpha-L-arabinofuranosidase [Medicago truncatula],
GO:0046556 // alpha-N-arabinofuranosidase activity,
GO:0046373 // L-arabinose metabolic process,
comp1000860_c0_seq1,
Protein kinase [Medicago truncatula],
GO:0005524 // ATP binding;GO:0004674 // protein serine/threonine kinase activity,
,
I need to bother you to help me provide some command or some scripts to generated a .annot
file like this below:
ignore those lines without GO IDs
p.s. If you can't see this picture from Flickr, I also put it on GitHub
I look forward to hearing from all of you soon.
Thank you and best regards,
lzsph