Modify Fasta header
2
0
Entering edit mode
4.7 years ago
ysas ▴ 10

Hello

I have a fasta file with amino acids sequences, which was translated from nucleotide sequence by transeq.

>CE99543_15407_1
MAV.....
>CE99641_51257_1
MSQ......

I want to delete _1 at the end of fasta header because it was somehow added by transeq.

>CE99543_15407
MAV.....
>CE99641_51257
MSQ......

Just deleting _1 would not work because there are headers including _1 in the middle.

Could you please tell me how to do it?

Thank you very much for your help.

fasta • 1.2k views
ADD COMMENT
7
Entering edit mode
4.7 years ago
sed  's/_1$//'
ADD COMMENT
0
Entering edit mode

Thank you very much! It helped me to solve the issue.

ADD REPLY
3
Entering edit mode
4.7 years ago
Mensur Dlakic ★ 27k

You need to add end-of-line character \n to the replace command to make sure that _1 found anywhere else is not replaced.

perl -pi -e 's/_1\n/\n/g' input_file

ADD COMMENT
0
Entering edit mode

Will it need an option to specify that this is a multi-line regex? Would replacing _1$ with nothing not be better for a single line regex?

ADD REPLY
0
Entering edit mode

Thank you very much! I have solved my issue.

ADD REPLY

Login before adding your answer.

Traffic: 3213 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6