Q1: add > to the header; Q2: merge two header line into the same line and keep space between them; Q3: remove space between header and sequence
MT657978.1
Acaulospora foveata isolate
AAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAATTTC
MT626044.1
Claroideoglomus etunicatum
ACATACGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAA
Hi,everyone. I have sequences file like above, but I want to be like below:
>MT657978.1 Acaulospora foveata isolate
AAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAATTTC
>MT626044.1 Claroideoglomus etunicatum
ACATACGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGAAGGCAGCAGGCGCGCAA
Use awk and operate on NR. Do that once for every first line and once for every second line in separate subshells, and
paste
the output from them with blank space as separator. To this, with a similar awk operating on every third line,paste
using a unique delimiter that you then replace with a new line character usingsed
.input:
Output:
Later in the thread, OP says that they could have multiple sequence lines, so
NR%x
is not going to work. OP's data is quite mangled.Hello, everyone. If the sequences look like this, this is another story. How to add
>
to the header and remove space? using before script, I couldn't add>
at all. Sorry, I am a novice. I have many questions related to this. I really appreciate your effort.