Rename several fasta-headers
0
0
Entering edit mode
5.1 years ago

Hey guys,

I have a multi-fasta file containing several extracted regions, such as

>NZ_KI973281.1_1234..56789
atattgagctaaaaaaatcagttttccca...
>NZ_LAAL01000032.1_5456..32476
tgcagaagtaagggggtaacaccatgcct...
...

I would like to include strain name on fasta header, such as

>Enterobacter_sp._MGH_6_NZ_KI973281.1_1234..56789
atattgagctaaaaaaatcagttttccca...
>Enterobacter_hormaechei_subsp._xiangfangensis_strain_34984_NZ_LAAL01000032.1_5456..32476
tgcagaagtaagggggtaacaccatgcct...
...

Could you please help me out? Thanks!

genome assembly • 2.2k views
ADD COMMENT
1
Entering edit mode

See the following post

Renaming Entries In A Fasta File

and many others on its right panel,

like these ones: Rename fasta headers,

How to move the last 4 characters of all FASTA headers to the beginning?,

Renaming fasta file headers, etc.

There are many awk- or sed-scripts mentioned inside,

they may give you some hints.

ADD REPLY
0
Entering edit mode

Where are the strain names coming from? A separate file/NCBI search?

ADD REPLY
0
Entering edit mode

From simple NCBI search! I don't have a separate file with the corresponding strain name for each accession... And the suggested links can't help me on this issue. Can you help me out? Thanks!

ADD REPLY
2
Entering edit mode

The following will get you part way there.

Step 1: Look up names of the organisms in your blast result. (following work with the small snippet example above)

awk -F '>|_' '/^>/ {print $2"_"$3}' test | xargs -n 1 sh -c 'efetch -db nuccore -id "$0" -format docsum | xtract -pattern DocumentSummary -element Caption,Organism' > names.txt

names.txt now contains the names of the organisms.

Step 2: Use one of the solutions in Renaming fasta headers according to a matching name list to do the replacements. There is small issue though. names.txt does not contain the version number for the accession so the solutions may need to be changed to suit your needs.

ADD REPLY

Login before adding your answer.

Traffic: 2689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6