How to separate the name and date of FASTA sequence header with spaces in the same line
2
0
7 weeks ago
Sam

Hello everyone, I have a DNA Sequences data set of 5000 sequences. The sequences header contains the sequences' names and dates. I want to prepare a data file that comprises taxon names from sequence alignment and dates separated by spaces. My header formate is as given below

>A/Kilifi/100/2015|A /H3N2|1604249|12/14/2015


Required formate is (space between name and date)

>A/Kilifi/100/2015|A /H3N2|1604249|   12/14/2015


I cant do it manually because it will take a lot of time. if anyone knows a quick method to create a space. let me know. it would be really appreciated. i am looking forward to a positive response. Thankyou

1
have a look at sed https://linux.die.net/man/1/sed

0
Thankyou . I will check the link.

0
Can you use the command line? Have you tried any of the typical tools that would come to mind, e.g. awk or sed?

0
I used the sed previously for removing the spaces in the header. but i could not succeed in creating the spaces between name and date.

0
$echo ">A/Kilifi/100/2015|A /H3N2|1604249|12/14/2015" | awk -F "|" -v OFS="|" '/^>/ {$NF=" "$NF}1' >A/Kilifi/100/2015|A /H3N2|1604249| 12/14/2015$ echo ">A/Kilifi/100/2015|A /H3N2|1604249|12/14/2015"  | sed -r '/^>/ s/(.*\|)/\1 /'

2
7 weeks ago

amongst likely other approaches you can achieve this by running a perl oneliner on it.

cat <your file> | perl -p -e 's/\|(\d+)\//\| \$1\//g' > <new_file>


this will match the last | followed by numbers and a / and replace it by what it matched (== captured between the () ) preceded with a space

or with sed :

 sed -E 's/\|([0-9]+)\//\| \1\//g' <your_file> > <new_file>


roughly same principle as with perl above, the -E is obligatory to be able to use the + and grouping (== match and re-use the matched part)

0
Thank you lieven . I used the both Perl and sed command and got the result. I am really thankful for your time and help.

0
you're welcome.

However do keep in mind this was a free tasting sample for a new-user. :) From now on you will need to show more what you did/tried to get such things resolved yourself. We're here to help out with problems/issues not to provide random snippets of code,.

1
7 weeks ago
Hugo

You may have a look at SEDA (https://www.sing-group.org/seda/), a desktop tool with a GUI to perform different operations on FASTA files. The 'Rename header / Multipart header' (https://www.sing-group.org/seda/manual/operations.html#multipart-header) or the 'Rename header / Replace word' (https://www.sing-group.org/seda/manual/operations.html#replace-word) operations may allow you to perform such transformation after playing a bit. Such operations include a preview to ease their configuration.

0
I will check the links provided by you. I hope these manuals will help me to understand these commands. Thanks