Question: How to remove fasta headers in a multifasta file and write file name as a fasta header?
0
gravatar for Dineshkumar K
10 days ago by
Kasaragod, Kerala, India
Dineshkumar K40 wrote:

I have fasta file namely 119XCA.fasta as shown below,

>cellulase
ATGCTA
>gyrase
TGATGCT
>16s
TAGTATG

I need to remove all the fasta headers, keep the sequences one by one and need to write file name as a fasta header. The expected outcome is shown below,

>119XCA
ATGCTA
TGATGCT
TAGTATG

I have used the following script sed '/^>/d' foo.fa > out.fa which remove the fasta headers but, i do not know how to manage to write file name as a header. Therefore, please help me to do the same.

ADD COMMENTlink modified 10 days ago by Joe18k • written 10 days ago by Dineshkumar K40
2
gravatar for Shred
10 days ago by
Shred220
Shred220 wrote:

Assuming you're using BASH, use basename to get the filename with no PATH. Like:

filename=$(basename -i file | cut -d'.' -f1)

Then you could replace it using sed

sed -i "s/^\>.*$/$filename/" your.fasta

Remember to use double quotes to use variables in sed.

ADD COMMENTlink written 10 days ago by Shred220

I don't think this will concatenate the sequence?

ADD REPLYlink written 10 days ago by Joe18k

He said he's already got the concatenated file.

ADD REPLYlink written 10 days ago by Shred220
2
gravatar for cpad0112
10 days ago by
cpad011214k
India
cpad011214k wrote:

try this:

$ cat test.fa
>cellulase
ATGCTA
>gyrase
TGATGCT
>16s
TAGTATG

$  awk 'BEGIN {print ">"ARGV[1]};!/^>/{print}' test.fa

>test.fa
ATGCTA
TGATGCT
TAGTATG

$ cat <(echo ">"$basename test.fa) <(grep -v ">" test.fa) (note:extra space in header)
> test.fa
ATGCTA
TGATGCT
TAGTATG
ADD COMMENTlink modified 10 days ago • written 10 days ago by cpad011214k
2
gravatar for Joe
10 days ago by
Joe18k
United Kingdom
Joe18k wrote:

Not the prettiest code in the world, but this will work.

Run it like so: bash scriptname.sh /path/to/files/*.fasta

for file in $1 ; do
    cat $file | sed -e '1!{/^>.*/d;}' | \
                sed ':a;N;$!ba;s/\n//2g' | \
                sed '1!s/.\{80\}/&\n/g' | \
                sed "s|>.*$|>${file##*/}|g" > $(basename "${file##*/}" ".fasta" ).fa
done

You can also do it as a oneliner for a single file if needed:

cat filename.fasta | sed -e '1!{/^>.*/d;}' | sed ':a;N;$!ba;s/\n//2g' | sed '1!s/.\{80\}/&\n/g' | sed "s|>.*$|>${file##*/}|g" > $(basename "${file##*/}" ".fasta" ).fa
ADD COMMENTlink written 10 days ago by Joe18k

(Note the first 3 sed calls are useful for concatenating any fasta)

ADD REPLYlink written 10 days ago by Joe18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1471 users visited in the last hour