protein text format convert!
1
0
Entering edit mode
8.5 years ago
fufuyou ▴ 110

Hi everyone,

I have some protein sequences with carriage return every line. But my program need one protein name and sequence.

screenshot

Could you help me replace these carriage return?

Thanks,
Fuyou

sequence • 1.4k views
ADD COMMENT
0
Entering edit mode

How exactly the formats (from/to)? Why don't you post small examples?

ADD REPLY
1
Entering edit mode
8.5 years ago
13en ▴ 90

If I'm understanding you correctly, and you have protein sequences like this:

>prtn1
DTENKRK
KDFLTSE
NSLPRIS

and you want

>prtn1
DTENKRKKDFLTSENSLPRISS

you could use something like this

awk 'BEGIN {ORS=""}{if ($1 ~ /^>/) print "\n"$1"\n"; else print}END{print "\n"}' <protein file>

ORS="" removes the end of line when awk prints, so the protein sequence is concatenated into one line. Checking for a line starting with ">" means the ID can be printed on its own line, by including "\n" newline characters. This can handle multiple sequences, but ends up starting with a blank line, so if your file only has one sequence you might want to replace "\n"$1"\n" with $1"\n".

ADD COMMENT

Login before adding your answer.

Traffic: 2078 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6