Question: How to add file name after > in a multi-fasta file?
0
gravatar for saadleeshehreen
29 days ago by
saadleeshehreen70 wrote:

Hi, I have a multi-fastafile called HTH_7.fasta.

>W1DFQ1_KLEPN/141-185
 GRKKSLSSERIAELRQRVEAGEQKTKLAREFGISRETLYQYLRTD
>HIN_SALAE/139-183
GHPRAINRHEQEQISRLLEKGHPRQQLAIIFGIGVSTLYRYFPAS
>GIN_BPMU/138-182
GRPPKLTKAEWEQAGRLLAQGIPRKQVALIYDVALSTLYKKHPAK
>CIN_BPP1/138-182
GRRPKYQEETWQQMRRLLEKGIPRKQVAIIYDVAVSTLYKKFPAS
>UVP1_ECOLX/144-189
GRKPSLSEDDINEMKILLADPEMTVGAVAKRFNVSRMTIYRYTTKG

I want to paste the file name HTH_7 after each of the "> header"

>W1DFQ1_KLEPN/141-185_HTH_7
 GRKKSLSSERIAELRQRVEAGEQKTKLAREFGISRETLYQYLRTD
>HIN_SALAE/139-183_HTH_7
GHPRAINRHEQEQISRLLEKGHPRQQLAIIFGIGVSTLYRYFPAS

I tried as following but it is not served my purpose. because i=the file name also pasted after the fasta sequence. How to remove that?

awk '{print $0 "_"FILENAME}' HTH_7.fasta | sed "s/.fasta//" | head
>W1DFQ1_KLEPN/141-185_HTH_7
GRKKSLSSERIAELRQRVEAGEQKTKLAREFGISRETLYQYLRTD_HTH_7
>HIN_SALAE/139-183_HTH_7
GHPRAINRHEQEQISRLLEKGHPRQQLAIIFGIGVSTLYRYFPAS_HTH_7
>GIN_BPMU/138-182_HTH_7
 GRPPKLTKAEWEQAGRLLAQGIPRKQVALIYDVALSTLYKKHPAK_HTH_7
>CIN_BPP1/138-182_HTH_7
GRRPKYQEETWQQMRRLLEKGIPRKQVAIIYDVAVSTLYKKFPAS_HTH_7
>UVP1_ECOLX/144-189_HTH_7
GRKPSLSEDDINEMKILLADPEMTVGAVAKRFNVSRMTIYRYTTKG_HTH_7

Thanks in advance

header sequence rename fasta • 113 views
ADD COMMENTlink modified 28 days ago by Hugo280 • written 29 days ago by saadleeshehreen70

try:

$ sed '/^>/ s/$/_HTH_7$/g' input.fa
ADD REPLYlink modified 28 days ago • written 28 days ago by cpad011213k
0
gravatar for saadleeshehreen
29 days ago by
saadleeshehreen70 wrote:

awk '/>/{sub(">","&"FILENAME"_");sub(/\.fasta/,x)}1' sample_1.fasta

ADD COMMENTlink modified 29 days ago by lakhujanivijay5.0k • written 29 days ago by saadleeshehreen70
0
gravatar for lakhujanivijay
29 days ago by
lakhujanivijay5.0k
India
lakhujanivijay5.0k wrote:

using seqkit

seqkit replace -p '(.+)' -r '${1}_HTH_7' sample_1.fasta

ADD COMMENTlink written 29 days ago by lakhujanivijay5.0k
0
gravatar for Hugo
28 days ago by
Hugo280
Universidade de Vigo, Ourense (Spain)
Hugo280 wrote:

You can use SEDA, an open source application for processing FASTA files containing DNA and protein sequences. The Rename header operation has an Add prefix/suffix mode (https://www.sing-group.org/seda/manual/operations.html#add-prefix-suffix) that allows you to add the text you want at the beginning of the headers.

ADD COMMENTlink written 28 days ago by Hugo280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 836 users visited in the last hour