Question: Changing fasta headers in multifasta file
0
gravatar for James
2.1 years ago by
James20
APHA Weybridge, UK
James20 wrote:

Hi, Please can anyone help me with this. I have a multifasta file that I want to make a blast database but the header of each sequence is not quite in the correct format. The multifasta has several 100000 sequences in it so really dont want to start again.

correct format should be

>unique-id|my sequence name|etc|etc

I currently have

>notunique___|unique-id|my sequence name|etc|etc

Im pretty sure this should be doable with the sed command but no clue how to do this myself. I want too either just delete 'notunique___|' or replace '>notunique___|' with a new >

notunique is a mix of letters and numbers that are not always the same length of characters.

Any help would be much appreciated

Thank you James

sed • 1.4k views
ADD COMMENTlink modified 2.1 years ago by genomax80k • written 2.1 years ago by James20

Renaming Entries In A Fasta File

ADD REPLYlink written 2.1 years ago by Sej Modha4.6k

Dear James, you may also be interested in SEDA (http://www.sing-group.org/seda/), a desktop software that incorporates a lot of functions for processing FASTA files. One of that functions is the "Rename header", which allows you changing headers in different ways. Regards.

ADD REPLYlink written 2.1 years ago by Hugo230
3
gravatar for lieven.sterck
2.1 years ago by
lieven.sterck7.2k
VIB, Ghent, Belgium
lieven.sterck7.2k wrote:

you can use this perl oneliner:

cat <yourFile> | perl -pi -e 's/>.+?\|/>/g'

on the other hand there should be many other solutions already posted on biostars ;-)

ADD COMMENTlink written 2.1 years ago by lieven.sterck7.2k

Thank you so much lieven.sterck, that worked perfectly.

While there seem to be lots of posts about this stuff I couldn't work out how to change other answers to my exact problem. I really have no experience with these things.

Thanks again James

ADD REPLYlink written 2.1 years ago by James20

@lieven.sterck this served the purpose for me but can you explain this command ? just for understanding ?

ADD REPLYlink modified 16 months ago • written 16 months ago by hafiz.talhamalik230

sure.

the cat <yourFile> simply prints the content of your file which is then passed through to (using this | symbol , aka linux pipe ) a perl oneliner perl -pi -e 's/>.+?\|/>/g' which does the substitution itself. it replace all occurrences of a > with any characters following, up to and including the first | in the fasta header with a single > .

ADD REPLYlink written 16 months ago by lieven.sterck7.2k

Thank you very much.

ADD REPLYlink written 16 months ago by hafiz.talhamalik230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 968 users visited in the last hour