script to print fasta file

0

Entering edit mode

4.1 years ago

ravi.eshwari ▴ 10

Hi,

i have fasta files which look like this

>CATGTAGTGATTGATAGTGATA(1)
CATGTAGTGATTGATAGTGATA
>ATCCGTGAGCTTGAAGGATCCGCC(1)
ATCCGTGAGCTTGAAGGATCCGCC
>AAAACTACATATACATTCGGATT(2)
AAAACTACATATACATTCGGATT
>CCTGCATAGAGGATTCCGAAC(1)
CCTGCATAGAGGATTCCGAAC
>CATGAACAAGATGTTTGAGAACT(1)
CATGAACAAGATGTTTGAGAACT

I need to edit the header for each sequence and along with it print the sequences based on the abundance with unique header sequence

can someone kindly help me with this

How do i do this either in linux or shell scripting

Thank you!

sequence fasta • 897 views

ADD COMMENT • link 4.1 years ago by ravi.eshwari ▴ 10

0

Entering edit mode

And what have you tried so far?

ADD REPLY • link 4.1 years ago by lakhujanivijay 5.8k

1

Entering edit mode

i used the following command

awk '/^>/{print ">seq1" ++i; next}{print}' < SRR1266859.fa > SRR1266859.new.fa

it did change the header sequence but i now need to also print the sequence with more than one abundance with unique header

ADD REPLY • link updated 4.1 years ago by lakhujanivijay 5.8k • written 4.1 years ago by ravi.eshwari ▴ 10

0

Entering edit mode

Please provide an example of the output you would expect - your question is unanswerable at the moment.

Edit it how? What abundance?

ADD REPLY • link 4.1 years ago by Joe 21k

1

Entering edit mode

In addition: Technically the posted example qualifies for fasta format but having sequence repeated in the header is not going to make this easy to decipher.

ADD REPLY • link 4.1 years ago by GenoMax 141k

Login before adding your answer.