Question: How To Add Specific Word To First word in Fasta header
0
gravatar for empyrean999
4.7 years ago by
empyrean999160
Canada
empyrean999160 wrote:

I have fasta file with different headers. Basically assembled fasta files some are assembled with different versions so they got same fasta names. When i tried to make a blast database with "-parse_seqids" its complaining me of duplicate id's. So i would like to add a extension to with version of assembly to its fasta headers. 

Examples 

Input fasta sequences : ( i am just showing headers here)

>Contig1_Node1_length20_cov30 Date:03/01/2015 Sequence_Organism:Other 

>Contig2_deg1 Date:03/01/2015 Sequence_Organism:Other 

>Contig3_jcg20839 Date:03/01/2015 Sequence_Organism:Other 

Output fasta sequences :

>Contig1_Node1_length20_cov30_V2 Date:03/01/2015 Sequence_Organism:Other 

>Contig2_deg1_V2 Date:03/01/2015 Sequence_Organism:Other 

>Contig3_jcg20839_V2 Date:03/01/2015 Sequence_Organism:Other 

 

awk unix sed perl • 1.5k views
ADD COMMENTlink modified 4.7 years ago by Frédéric Mahé3.0k • written 4.7 years ago by empyrean999160

exclude the -parse_seqids" while creating blast database. It will not give any error.

ADD REPLYlink written 4.7 years ago by Renesh1.7k

True but i need -parse_seqids to extract sequences from fasta file. 

ADD REPLYlink written 4.7 years ago by empyrean999160
4
gravatar for Frédéric Mahé
4.7 years ago by
France, Montpellier, CIRAD
Frédéric Mahé3.0k wrote:

Hi, here is a sed solution:

sed -e '/^>/ s/ /_V2 /' input.fa > output.fa

and a awk solution:

awk '{printf (/^>/) ? $1"_V2 "$2" "$3"\n" : $0"\n"}' input.fa > output.fa
ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Frédéric Mahé3.0k
2
gravatar for mxs
4.7 years ago by
mxs530
mxs530 wrote:

Ok, the simplest way is again either a perl of awk script:

perl -lne  'chomp;if(/>(.*?)\s+(.*)/){print ">$1_V2 $2"}else{print $_}' input.fa > output.fa

_V2   in the above line is what you are adding as an extension. If there are several different extensions then you should create a key table and preloaded as a hash table. Again everything can be done in a single line.

hope this helps

cheers

mxs

PS

please ask if anything is unclear regarding the above solution

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by mxs530
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1268 users visited in the last hour