Question: fasta sequence manupulation
0
gravatar for poonam.bi01
3.5 years ago by
poonam.bi0120
poonam.bi0120 wrote:

i have fasta file like this:

>1001796365 4.F.1.1.5
MDSIRPATFQIPAAVRELGWAALLLFFVLLSVHEWFSPPGWFGLLAILIFATQGALILTR
WPARQNFGWANRTTLLRSILVVSLVAWAPFLPAADSSALWIYGVACLIALILDGVDGKVA
>1002048002 2.A.4.2.8
MSPSRTARLYFLLVLDLLFFVLEISIGYAVGSLALVADSFHMLNDVVSLIIALYAIKLAA
SSTPTTRYSYGWHRAEILAALVNGVFLLALCFTITLEALERFFSTPEISNPKLIVLVGSL
>1002048004 2.A.4.5.2
IASDIRRILHRHGIHSSTIQPEYHPVRDTILEERSKDVNCLISCPPDSACCEVQACCPSY
AGT

header order in fasta sequence :

 >+first_id then+\t+second_id

i want my sequence in this formate:

 >4.F.1.1.5
MDSIRPATFQIPAAVRELGWAALLLFFVLLSVHEWFSPPGWFGLLAILIFATQGALILTR
WPARQNFGWANRTTLLRSILVVSLVAWAPFLPAADSSALWIYGVACLIALILDGVDGKVA
 >2.A.4.2.8
MSPSRTARLYFLLVLDLLFFVLEISIGYAVGSLALVADSFHMLNDVVSLIIALYAIKLAA
SSTPTTRYSYGWHRAEILAALVNGVFLLALCFTITLEALERFFSTPEISNPKLIVLVGSL
 >2.A.4.5.2
IASDIRRILHRHGIHSSTIQPEYHPVRDTILEERSKDVNCLISCPPDSACCEVQACCPSY
AGT

only

 >+second_id+\n+sequence
alignment sequence • 787 views
ADD COMMENTlink modified 3.5 years ago by Daniel3.7k • written 3.5 years ago by poonam.bi0120
1
  1. No greater-than sign means that it's not fasta
  2. The format you request looks very random in your example
ADD REPLYlink written 3.5 years ago by 5heikki8.6k

The greater-than sign gets auto-formatted I think, so I guess the post doesn't reflect what OP had in mind.

ADD REPLYlink written 3.5 years ago by WouterDeCoster42k
3
gravatar for Daniel
3.5 years ago by
Daniel3.7k
Cardiff University
Daniel3.7k wrote:

To code golf-ify the answer, you could do it in fewer keystrokes with sed:

# 21 Keystrokes (+infile.fa)
sed -i 's/^>.\+ />/g' infile.fa

EDIT: Golfing harder:

# 19 Keystrokes (+infile.fa)
sed -i 's/>.\+ />/' infile.fa
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Daniel3.7k
0
gravatar for venu
3.5 years ago by
venu6.3k
Germany
venu6.3k wrote:

If I understand it properly, something like this should work.

cat file.fa | paste - - | awk '{print ">"$2"\n"$3}' > new_file.fa

PS: When I copy paste your sequence, there is a gap in the sequence. If it is a formatting problem, it is ok, if not make sure nothing is going wrong.

After reformatting (by genomax2), first linearize the fasta file and use the above.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by venu6.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 942 users visited in the last hour