Delete character from sequence id
2
0
Entering edit mode
3.1 years ago

Hi, I am trying to delete the NCBI accession numbers from the sequence ids in a fasta file.

Sequences ids look like:

>Elytraria_mexicana_JQ691768.1

I am trying things like

sed 's/_*.*//' myfile.fasta

or

sed 's/_*.*//g' myfile.fasta

They don't work.

Have any of you done this before?

Thanks for any input,

sed • 766 views
ADD COMMENT
1
Entering edit mode

I would try simply sed 's/_[A-Z].[0-9]*.[0-9]//g' myfile.fasta

ADD REPLY
0
Entering edit mode

You're using . as both a metacharacter and a literal .. Are you sure it will work reliably and the . that is supposed to match the literal . won't end up matching something else?

ADD REPLY
0
Entering edit mode

yes I agree Ram, Here . may match anything. to make it more reliable we can use \. instead. Thanks

ADD REPLY
0
Entering edit mode

Thanks!! It works!!

sed -r 's/_[A-Z0-9]+[.][0-9]+//g' aligned_trnG-trnS.fasta > new_trnG-trnS.fasta
ADD REPLY
0
Entering edit mode

just cut:

 cut -d '_' -f 1,2 in.fasta
ADD REPLY
0
Entering edit mode

Thank you so much!!
This command works:

sed -r 's/_[A-Z0-9]+[.][0-9]+//g' aligned_trnG-trnS.fasta > new_trnG-trnS.fasta

=D

ADD REPLY
0
Entering edit mode

Please stop adding answers. This content belongs as a reply to my comment. I'm moving it to a comment on the top level post now.

ADD REPLY
4
Entering edit mode
3.1 years ago
Ram 37k

Your sed is designed to look at each string once, and delete all occurrences of underscore followed by a character, removing just _J. Given that the Q is not preceded by an underscore, your pattern doesn't match it.

Try sed 's/_[A-Z0-9]+[.][0-9]+//g' myfile.fasta

ADD COMMENT
0
Entering edit mode

Hi Ram, Thank you so much for your suggestion. this command

sed 's/_[A-Z0-9]+[.][0-9]+//g' myfile.fasta

Doesn't works. I am now trying something like

sed 's/_+[A-Z]+[A-Z]+[0-9]+[0-9]+[0-9]+[0-9]+[0-9]+[0-9]+[.]+[0-9]//g' myfile.fasta

And it also doesn't works. Would you have any sed manual to suggest? Many thanks!

ADD REPLY
0
Entering edit mode

Try sed -r instead of just sed with the first command. The second one is a little too unnecessarily verbose.

ADD REPLY

Login before adding your answer.

Traffic: 1355 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6