I'm using a trial of CLC Workbench for assemblies. I would like to enter my assembled fa files into MG-RAST. However, CLC Workbench gives files in the form of:
>sequence_1 Average coverage: 5.6 ACCAGCGTTCTCTACACA >sequence_2 Average coverage: 6.4 GTTATACAGGATAAGAATC
And so forth (of course, my contigs are much longer). MG-RAST request a format such as:
>sequence_1_[cov=5.6] ACCAGCGTTCTCTACACA >sequence_2_[cov=6.4] GTTATACAGGATAAGAATC
It is easy enough to get half-way there, and a code below (where BG1.fa is my input file and BGcon.fa is the new output file):
<BG1.fa sed 's/ Average coverage: /_[cov=/g' >BG1con.fa
Gets me to the following fa format:
>sequence_1_[cov=5.6 ACCAGCGTTCTCTACACA >sequence_2_[cov=6.4 GTTATACAGGATAAGAATC
But I just cannot get that last little bracket at the end. I've tried a couple of things, but it always puts the bracket on a new line such as:
>sequence_1_[cov=5.6 ] ACCAGCGTTCTCTACACA >sequence_2_[cov=6.4 ] GTTATACAGGATAAGAATC
I must apologize, for I am brand new to the sed language, and it still is pretty confusing for me.
Any idea how to eloquently (or not) get the last bracket up?