Question: How to add file name after > in a multi-fasta file?
0
gravatar for saadleeshehreen
7 months ago by
saadleeshehreen70 wrote:

Hi, I have a multi-fastafile called HTH_7.fasta.

>W1DFQ1_KLEPN/141-185
 GRKKSLSSERIAELRQRVEAGEQKTKLAREFGISRETLYQYLRTD
>HIN_SALAE/139-183
GHPRAINRHEQEQISRLLEKGHPRQQLAIIFGIGVSTLYRYFPAS
>GIN_BPMU/138-182
GRPPKLTKAEWEQAGRLLAQGIPRKQVALIYDVALSTLYKKHPAK
>CIN_BPP1/138-182
GRRPKYQEETWQQMRRLLEKGIPRKQVAIIYDVAVSTLYKKFPAS
>UVP1_ECOLX/144-189
GRKPSLSEDDINEMKILLADPEMTVGAVAKRFNVSRMTIYRYTTKG

I want to paste the file name HTH_7 after each of the "> header"

>W1DFQ1_KLEPN/141-185_HTH_7
 GRKKSLSSERIAELRQRVEAGEQKTKLAREFGISRETLYQYLRTD
>HIN_SALAE/139-183_HTH_7
GHPRAINRHEQEQISRLLEKGHPRQQLAIIFGIGVSTLYRYFPAS

I tried as following but it is not served my purpose. because i=the file name also pasted after the fasta sequence. How to remove that?

awk '{print $0 "_"FILENAME}' HTH_7.fasta | sed "s/.fasta//" | head
>W1DFQ1_KLEPN/141-185_HTH_7
GRKKSLSSERIAELRQRVEAGEQKTKLAREFGISRETLYQYLRTD_HTH_7
>HIN_SALAE/139-183_HTH_7
GHPRAINRHEQEQISRLLEKGHPRQQLAIIFGIGVSTLYRYFPAS_HTH_7
>GIN_BPMU/138-182_HTH_7
 GRPPKLTKAEWEQAGRLLAQGIPRKQVALIYDVALSTLYKKHPAK_HTH_7
>CIN_BPP1/138-182_HTH_7
GRRPKYQEETWQQMRRLLEKGIPRKQVAIIYDVAVSTLYKKFPAS_HTH_7
>UVP1_ECOLX/144-189_HTH_7
GRKPSLSEDDINEMKILLADPEMTVGAVAKRFNVSRMTIYRYTTKG_HTH_7

Thanks in advance

header sequence rename fasta • 294 views
ADD COMMENTlink modified 7 months ago by Hugo340 • written 7 months ago by saadleeshehreen70

try:

$ sed '/^>/ s/$/_HTH_7$/g' input.fa
ADD REPLYlink modified 7 months ago • written 7 months ago by cpad011214k
0
gravatar for saadleeshehreen
7 months ago by
saadleeshehreen70 wrote:

awk '/>/{sub(">","&"FILENAME"_");sub(/\.fasta/,x)}1' sample_1.fasta

ADD COMMENTlink modified 7 months ago by lakhujanivijay5.3k • written 7 months ago by saadleeshehreen70
0
gravatar for lakhujanivijay
7 months ago by
lakhujanivijay5.3k
India/Ahmedabad
lakhujanivijay5.3k wrote:

using seqkit

seqkit replace -p '(.+)' -r '${1}_HTH_7' sample_1.fasta

ADD COMMENTlink written 7 months ago by lakhujanivijay5.3k
0
gravatar for Hugo
7 months ago by
Hugo340
Universidade de Vigo, Ourense (Spain)
Hugo340 wrote:

You can use SEDA, an open source application for processing FASTA files containing DNA and protein sequences. The Rename header operation has an Add prefix/suffix mode (https://www.sing-group.org/seda/manual/operations.html#add-prefix-suffix) that allows you to add the text you want at the beginning of the headers.

ADD COMMENTlink written 7 months ago by Hugo340
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1645 users visited in the last hour