Question: Changing names of Fasta headers
0
gravatar for tpaisie
12 months ago by
tpaisie70
University of Florida
tpaisie70 wrote:

So I have a director full of fasta files and I want to change the fasta header in each one by the name of their corresponding fasta file. For example:

HC1993.fa

> X58834
CCTGCATCTGCAA

HC1993.fa

> HC1993
CCTGCATCTGCAA

I have about 50 fasta files like that in a directory that I was to do the same thing to. I've been using this sed command for one file that works:

sed 's/>.*/>HC1193/' HC1993.fa > new/HC1993.fa

But now I want to loop this command through the directory and this is the command I have been using:

for i in $(ls *.fa | rev | cut -c 4- | rev | uniq)
do
    sed 's/>.*/>${i}/' ${i}.fa > new/${i}.fa
done

This command gives me this for all the new fasta file headers

HC1993.fa

>${i}
CCTGCATCTGCAA

Now I know there is a bunch of way to fix this, but could someone help me fix the bash loop I made? I want to learn my incorrect command and now to fix it. Thanks!

bash loop unix sed sequence fasta • 1.5k views
ADD COMMENTlink modified 9 months ago by h.mon24k • written 12 months ago by tpaisie70
0
gravatar for jrj.healey
12 months ago by
jrj.healey11k
United Kingdom
jrj.healey11k wrote:

As I understand it, you just want to make the header of the file, the filename?

e.g. given:

~/test/seqs$ ls
seq1.fasta  seq2.fasta  seq3.fasta
~/test/seqs$ cat seq*
>tpg|Magnaporthiopsis_incrustans|JF414846
 ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGC
>tpg|Pyricularia_pennisetigena|AB818016
GCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
>tpg|Inocybe_sororia|EU525947
AACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGC

So, for file in *.fasta ; do sed -i "s/^>.*/>"${file%.*}"/gi" $file; done

Yeilds:

~/test/seqs$ for file in *.fasta ; do sed -i "s/^>.*/>"${file%.*}"/gi" "$file"; done
>seq1
ACTGTAGTAGCTACGATCGATCAGATGATCACGTAGCATCGATCGATCATCGACTAGTAGATCACTCGACATAGATCCACATCAATAGATCATCATCATCATAATCGATCACTAGCAGC
>seq2
GCAAGNTTCATGACGATGTAGAATGGCTTATCGAAGGGAGCAGGCCAGGGATTGAGGTCCGTCTCACGGGTTGGCTTCACTCCCCCACTGCCAGCCCTCTTGCTGCAACTCCACCAGAA
>seq3
AACCANGCCGCGACGGCGGTGCGATCGGGAAACGCGGCGGTGGCGGAGGAATCGGCCATCCTTCACCATATCGGCCAAGGATTGTGGTTCCTGTAGGGCTCGCGCAGCCCAGGACGCGC
ADD COMMENTlink modified 12 months ago • written 12 months ago by jrj.healey11k

So yes your interpretation of what I would like is correct. Although I used your command and I'm getting this as an error:

sed: 1: "HC1993.fa": extra characters at the end of H command

And it is not making the new fasta files with the new headers.

ADD REPLYlink written 12 months ago by tpaisie70

Are you using Mac OS?

ADD REPLYlink written 12 months ago by jrj.healey11k
0
gravatar for cpad0112
12 months ago by
cpad011211k
India
cpad011211k wrote:

Example fasta:

$ cat HC1993.fa 
>X58834 
CCTGCATCTGCAA

Expected output (assumption is that first line in each fasta file is fasta header):

$ cat HC1993.fa 
>HC1993
CCTGCATCTGCAA

in bash:

$ for i in *.fa; do sed "1s/.*/>${i%.fa}/" $i; done
>HC1993
CCTGCATCTGCAA

using GNU-parallel:

$  parallel  'sed "1s/.*/>{.}/" {}' ::: *.fa
>HC1993
CCTGCATCTGCAA
ADD COMMENTlink modified 12 months ago • written 12 months ago by cpad011211k

Ohh thank you so much that worked!!!!

ADD REPLYlink written 12 months ago by tpaisie70

For future reference, code can be further shorted by:

$ parallel  'sed "/^>/ c {.}" {}' ::: *.fa
ADD REPLYlink written 9 months ago by cpad011211k
0
gravatar for cpad0112
12 months ago by
cpad011211k
India
cpad011211k wrote:

try changing 'sed 's/>.*/>${i}/' to sed "s/>.*/>${i}/".

ADD COMMENTlink modified 12 months ago • written 12 months ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1263 users visited in the last hour