Question: Changing names of Fasta headers
0
gravatar for tpaisie
22 months ago by
tpaisie70
University of Florida
tpaisie70 wrote:

So I have a director full of fasta files and I want to change the fasta header in each one by the name of their corresponding fasta file. For example:

HC1993.fa

> X58834
CCTGCATCTGCAA

HC1993.fa

> HC1993
CCTGCATCTGCAA

I have about 50 fasta files like that in a directory that I was to do the same thing to. I've been using this sed command for one file that works:

sed 's/>.*/>HC1193/' HC1993.fa > new/HC1993.fa

But now I want to loop this command through the directory and this is the command I have been using:

for i in $(ls *.fa | rev | cut -c 4- | rev | uniq)
do
    sed 's/>.*/>${i}/' ${i}.fa > new/${i}.fa
done

This command gives me this for all the new fasta file headers

HC1993.fa

>${i}
CCTGCATCTGCAA

Now I know there is a bunch of way to fix this, but could someone help me fix the bash loop I made? I want to learn my incorrect command and now to fix it. Thanks!

bash loop unix sed sequence fasta • 2.8k views
ADD COMMENTlink modified 20 months ago by h.mon29k • written 22 months ago by tpaisie70
0
gravatar for Joe
22 months ago by
Joe16k
United Kingdom
Joe16k wrote:

As I understand it, you just want to make the header of the file, the filename?

e.g. given:

~/test/seqs$ ls
seq1.fasta  seq2.fasta  seq3.fasta
~/test/seqs$ cat seq*
>tpg|Magnaporthiopsis_incrustans|JF414846
 ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGC
>tpg|Pyricularia_pennisetigena|AB818016
GCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
>tpg|Inocybe_sororia|EU525947
AACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGC

So, for file in *.fasta ; do sed -i "s/^>.*/>"${file%.*}"/gi" $file; done

Yeilds:

~/test/seqs$ for file in *.fasta ; do sed -i "s/^>.*/>"${file%.*}"/gi" "$file"; done
>seq1
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGC
>seq2
GCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
>seq3
AACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGC
ADD COMMENTlink modified 22 months ago • written 22 months ago by Joe16k

So yes your interpretation of what I would like is correct. Although I used your command and I'm getting this as an error:

sed: 1: "HC1993.fa": extra characters at the end of H command

And it is not making the new fasta files with the new headers.

ADD REPLYlink written 22 months ago by tpaisie70

Are you using Mac OS?

ADD REPLYlink written 22 months ago by Joe16k
0
gravatar for cpad0112
22 months ago by
cpad011212k
India
cpad011212k wrote:

Example fasta:

$ cat HC1993.fa 
>X58834 
CCTGCATCTGCAA

Expected output (assumption is that first line in each fasta file is fasta header):

$ cat HC1993.fa 
>HC1993
CCTGCATCTGCAA

in bash:

$ for i in *.fa; do sed "1s/.*/>${i%.fa}/" $i; done
>HC1993
CCTGCATCTGCAA

using GNU-parallel:

$  parallel  'sed "1s/.*/>{.}/" {}' ::: *.fa
>HC1993
CCTGCATCTGCAA
ADD COMMENTlink modified 22 months ago • written 22 months ago by cpad011212k

Ohh thank you so much that worked!!!!

ADD REPLYlink written 22 months ago by tpaisie70

For future reference, code can be further shorted by:

$ parallel  'sed "/^>/ c {.}" {}' ::: *.fa
ADD REPLYlink written 20 months ago by cpad011212k
0
gravatar for cpad0112
22 months ago by
cpad011212k
India
cpad011212k wrote:

try changing 'sed 's/>.*/>${i}/' to sed "s/>.*/>${i}/".

ADD COMMENTlink modified 22 months ago • written 22 months ago by cpad011212k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1107 users visited in the last hour