How to place "\n" after each sequence in a fasta file?
3
0
Entering edit mode
7.1 years ago
AsoInfo ▴ 300

Hi, I would like to place "\n" after every sequence in a FASTA file. My FASTA file looks like this:

>pdb|5U15|L Chain L, Crystal Structure Of Dh270.uca3 (unliganded) From The Dh270 Broadly Neutralizing N332-glycan Dependent Lineage
QSALTQPASVSGSPGQSITISCTGTSSDVGSYNLVSWYQQHPGKAPKLMIYEVSKRPSGVSNRFSGSKSG
NTASLTISGLQAEDEADYYCCSYAGSSTVIFGGGTKLTVLGQPKGAPSVTLFPPSSEELQANKATLVCLI
SDFYPGAVTVAWKADSSPVKAGVETTTPSKQSNNKYAASSYLSLTPEQWKSHRSYSCQVTHEGSTVEKTV
APTECS
>pdb|5U15|H Chain H, Crystal Structure Of Dh270.uca3 (unliganded) From The Dh270 Broadly Neutralizing N332-glycan Dependent Lineage
QVQLVQSGAEVKKPGASVKVSCKASGYTFTGYYMHWVRQAPGQGLEWMGWINPNSGGTNYAQKFQGRVTM
TRDTSISTAYMELSRLRSDDTAVYYCARGGWISLYYDSSGYPNFDYWGQGTLVTVSGASTKGPSVFPLAP
SSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYIC
NVNHKPSNTKVDKRVEPKSCDKHHHHHH

I would like to modify the file as:

>pdb|5U15|L Chain L, Crystal Structure Of Dh270.uca3 (unliganded) From The Dh270 Broadly Neutralizing N332-glycan Dependent Lineage
QSALTQPASVSGSPGQSITISCTGTSSDVGSYNLVSWYQQHPGKAPKLMIYEVSKRPSGVSNRFSGSKSG
NTASLTISGLQAEDEADYYCCSYAGSSTVIFGGGTKLTVLGQPKGAPSVTLFPPSSEELQANKATLVCLI
SDFYPGAVTVAWKADSSPVKAGVETTTPSKQSNNKYAASSYLSLTPEQWKSHRSYSCQVTHEGSTVEKTV
APTECS

>pdb|5U15|H Chain H, Crystal Structure Of Dh270.uca3 (unliganded) From The Dh270 Broadly Neutralizing N332-glycan Dependent Lineage
QVQLVQSGAEVKKPGASVKVSCKASGYTFTGYYMHWVRQAPGQGLEWMGWINPNSGGTNYAQKFQGRVTM
TRDTSISTAYMELSRLRSDDTAVYYCARGGWISLYYDSSGYPNFDYWGQGTLVTVSGASTKGPSVFPLAP
SSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYIC
NVNHKPSNTKVDKRVEPKSCDKHHHHHH

I have encountered a post regarding linearizing the fasta file but couldnt figure it out.

FASTA Modify • 1.5k views
ADD COMMENT
0
Entering edit mode

Title of this post does not appear to match the request in the main body of the post. If you just want to add a gi_ before pdb you could use sed 's/^>pdb/\>gi\_pdb/g' your_file > new_file.

ADD REPLY
0
Entering edit mode

I have just edited the post. "gi" was added mistakingly.

ADD REPLY
4
Entering edit mode
7.1 years ago
Joe 21k

A very simple solution would be:

cat file.fasta | sed 's/>/\n>/g' | tail -n+2

It'll add an empty line at the start of the file as well though, but I don't know if thats a problem for you. Linearising the fasta and appending to the end of the sequence line would be a more robust way to do it 'properly'

Edit: use of tail removes the leading blank line.

ADD COMMENT
0
Entering edit mode

No need to cat the file. Just the sed part should be enough.

ADD REPLY
0
Entering edit mode

Yeah I know, it's just how I prefer to do it. You get to see the file, then can just redirect the file with > if it looks right. sed -i makes me nervous :P

ADD REPLY
2
Entering edit mode
7.1 years ago
nepgorkhey ▴ 130

a short python script like the following may work for you

for lines in f:
     if lines[0]=='>':
             print lines
    else:
             lines=lines+'\n'
             print lines
ADD COMMENT

Login before adding your answer.

Traffic: 2614 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6