Question: Batch rename protein fasta headers
0
gravatar for genomes_and_MGEs
6 months ago by
genomes_and_MGEs10 wrote:

Hey guys,

I have tons of protein multi-fasta files and I would like to append the name of the file to the fasta-headers. For example, for a input file one.txt with the headers

>1
ATGC...
>2
ATGCAT...

I would like to have the output

>one_1
ATGC...
>one_2
ATGCAT...

I use bbrename for DNA sequences, but doesn't work for protein files. Thanks!

sequence • 295 views
ADD COMMENTlink modified 6 months ago by Joe18k • written 6 months ago by genomes_and_MGEs10

your example files also don't really looks to be protein either ...

ADD REPLYlink written 6 months ago by lieven.sterck8.5k
1
gravatar for Pierre Lindenbaum
6 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:
 awk '/^>/ {printf(">%s_%s\n",substr(FILENAME,1,length(FILENAME)-3),substr($1,2));next;} {print}' *.txt
ADD COMMENTlink written 6 months ago by Pierre Lindenbaum130k
1
gravatar for Joe
6 months ago by
Joe18k
United Kingdom
Joe18k wrote:

As easy as:

for file in /path/to/files/*.fasta ; do
    sed "s/>/>$(basename $file .fasta)/gi" $file
done

You can tweak it if you want to keep the extension or whatever...

ADD COMMENTlink written 6 months ago by Joe18k
0
gravatar for Hood
6 months ago by
Hood0
Hood0 wrote:

You could use simple python script like:

from Bio import SeqIO

with open("one.txt", "r") as input:
    with open("output_filename.fasta", "w") as output:
        for record in SeqIO.parse(input, "fasta"):
            record.id = f"one_{record.id}"
            record.description = ""
            SeqIO.write(record, output, "fasta")

This require installation of biopython.

ADD COMMENTlink written 6 months ago by Hood0

Thanks for the reply. The thing is that I have multiple files to use as input. So, I guess using a loop to create a renamed output for each file would be better. Do you think you can help me with this? Usually for DNA sequences, I use

for F in *.fasta; do N=$(basename $F .fasta) ; bbrename.sh in=$F out=${N}_mod.fasta prefix=$F addprefix=t ; done

I need to find an alternative that works with multi-fasta protein files as input

ADD REPLYlink written 6 months ago by genomes_and_MGEs10
0
gravatar for Fatima
6 months ago by
Fatima610
United states
Fatima610 wrote:
for f in `ls *.fasta | sed 's/.fasta//g' `; do sed "/^>/ s/.*/&_$f/" "$f.fasta" >  "$f_new.fasta" ; done

.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Fatima610
0
gravatar for lakhujanivijay
6 months ago by
lakhujanivijay5.2k
India
lakhujanivijay5.2k wrote:

Using seqkit

seqkit replace -p '(.+)' -r 'one_$1' Filename.fasta

ADD COMMENTlink written 6 months ago by lakhujanivijay5.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1171 users visited in the last hour