Question

How to separate the name and date of FASTA sequence header with spaces in the same line

0

Entering edit mode

22 months ago

Sam • 0

Hello everyone, I have a DNA Sequences data set of 5000 sequences. The sequences header contains the sequences' names and dates. I want to prepare a data file that comprises taxon names from sequence alignment and dates separated by spaces. My header formate is as given below

>A/Kilifi/100/2015|A /H3N2|1604249|12/14/2015

Required formate is (space between name and date)

>A/Kilifi/100/2015|A /H3N2|1604249|   12/14/2015

I cant do it manually because it will take a lot of time. if anyone knows a quick method to create a space. let me know. it would be really appreciated. i am looking forward to a positive response. Thankyou

Sequence Header • 1.3k views

ADD COMMENT • link updated 22 months ago by cpad0112 21k • written 22 months ago by Sam • 0

1

Entering edit mode

have a look at sed https://linux.die.net/man/1/sed

ADD REPLY • link 22 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thankyou . I will check the link.

ADD REPLY • link 22 months ago by Sam • 0

0

Entering edit mode

Can you use the command line? Have you tried any of the typical tools that would come to mind, e.g. awk or sed?

ADD REPLY • link 22 months ago by Friederike 8.9k

0

Entering edit mode

I used the sed previously for removing the spaces in the header. but i could not succeed in creating the spaces between name and date.

ADD REPLY • link 22 months ago by Sam • 0

0

Entering edit mode

$ echo ">A/Kilifi/100/2015|A /H3N2|1604249|12/14/2015"  | awk -F "|" -v OFS="|" '/^>/ {$NF=" "$NF}1'

>A/Kilifi/100/2015|A /H3N2|1604249| 12/14/2015


$ echo ">A/Kilifi/100/2015|A /H3N2|1604249|12/14/2015"  | sed -r '/^>/ s/(.*\|)/\1 /'

ADD REPLY • link 22 months ago by cpad0112 21k

1

Entering edit mode

22 months ago

Hugo ▴ 380

You may have a look at SEDA (https://www.sing-group.org/seda/), a desktop tool with a GUI to perform different operations on FASTA files. The 'Rename header / Multipart header' (https://www.sing-group.org/seda/manual/operations.html#multipart-header) or the 'Rename header / Replace word' (https://www.sing-group.org/seda/manual/operations.html#replace-word) operations may allow you to perform such transformation after playing a bit. Such operations include a preview to ease their configuration.

ADD COMMENT • link 22 months ago by Hugo ▴ 380

0

Entering edit mode

I will check the links provided by you. I hope these manuals will help me to understand these commands. Thanks

ADD REPLY • link 22 months ago by Sam • 0

score 2 · Accepted Answer · 2022-06-21

2

Entering edit mode

22 months ago

lieven.sterck 15k

amongst likely other approaches you can achieve this by running a perl oneliner on it.

cat <your file> | perl -p -e 's/\|(\d+)\//\| $1\//g' > <new_file>

this will match the last | followed by numbers and a / and replace it by what it matched (== captured between the () ) preceded with a space

or with sed :

 sed -E 's/\|([0-9]+)\//\| \1\//g' <your_file> > <new_file>

roughly same principle as with perl above, the -E is obligatory to be able to use the + and grouping (== match and re-use the matched part)

ADD COMMENT • link 22 months ago by lieven.sterck 15k

0

Entering edit mode

Thank you lieven . I used the both Perl and sed command and got the result. I am really thankful for your time and help.

ADD REPLY • link 22 months ago by Sam • 0

0

Entering edit mode

you're welcome.

However do keep in mind this was a free tasting sample for a new-user. :) From now on you will need to show more what you did/tried to get such things resolved yourself. We're here to help out with problems/issues not to provide random snippets of code,.

ADD REPLY • link 22 months ago by lieven.sterck 15k