How to separate the name and date of FASTA sequence header with spaces in the same line
2
0
Entering edit mode
7 weeks ago
Sam • 0

Hello everyone, I have a DNA Sequences data set of 5000 sequences. The sequences header contains the sequences' names and dates. I want to prepare a data file that comprises taxon names from sequence alignment and dates separated by spaces. My header formate is as given below

>A/Kilifi/100/2015|A /H3N2|1604249|12/14/2015

Required formate is (space between name and date)

>A/Kilifi/100/2015|A /H3N2|1604249|   12/14/2015

I cant do it manually because it will take a lot of time. if anyone knows a quick method to create a space. let me know. it would be really appreciated. i am looking forward to a positive response. Thankyou

Sequence Header • 411 views
ADD COMMENT
1
Entering edit mode

have a look at sed https://linux.die.net/man/1/sed

ADD REPLY
0
Entering edit mode

Thankyou . I will check the link.

ADD REPLY
0
Entering edit mode

Can you use the command line? Have you tried any of the typical tools that would come to mind, e.g. awk or sed?

ADD REPLY
0
Entering edit mode

I used the sed previously for removing the spaces in the header. but i could not succeed in creating the spaces between name and date.

ADD REPLY
0
Entering edit mode
$ echo ">A/Kilifi/100/2015|A /H3N2|1604249|12/14/2015"  | awk -F "|" -v OFS="|" '/^>/ {$NF=" "$NF}1'

>A/Kilifi/100/2015|A /H3N2|1604249| 12/14/2015


$ echo ">A/Kilifi/100/2015|A /H3N2|1604249|12/14/2015"  | sed -r '/^>/ s/(.*\|)/\1 /'
ADD REPLY
2
Entering edit mode
7 weeks ago

amongst likely other approaches you can achieve this by running a perl oneliner on it.

cat <your file> | perl -p -e 's/\|(\d+)\//\| $1\//g' > <new_file>

this will match the last | followed by numbers and a / and replace it by what it matched (== captured between the () ) preceded with a space

or with sed :

 sed -E 's/\|([0-9]+)\//\| \1\//g' <your_file> > <new_file>

roughly same principle as with perl above, the -E is obligatory to be able to use the + and grouping (== match and re-use the matched part)

ADD COMMENT
0
Entering edit mode

Thank you lieven . I used the both Perl and sed command and got the result. I am really thankful for your time and help.

ADD REPLY
0
Entering edit mode

you're welcome.

However do keep in mind this was a free tasting sample for a new-user. :) From now on you will need to show more what you did/tried to get such things resolved yourself. We're here to help out with problems/issues not to provide random snippets of code,.

ADD REPLY
1
Entering edit mode
7 weeks ago
Hugo ▴ 360

You may have a look at SEDA (https://www.sing-group.org/seda/), a desktop tool with a GUI to perform different operations on FASTA files. The 'Rename header / Multipart header' (https://www.sing-group.org/seda/manual/operations.html#multipart-header) or the 'Rename header / Replace word' (https://www.sing-group.org/seda/manual/operations.html#replace-word) operations may allow you to perform such transformation after playing a bit. Such operations include a preview to ease their configuration.

ADD COMMENT
0
Entering edit mode

I will check the links provided by you. I hope these manuals will help me to understand these commands. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 1285 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6