Question: Removing the last part of fasta header in many alignmnet files
0
gravatar for Badh2
4 weeks ago by
Badh20
USA
Badh20 wrote:

Hello, I'm trying to remove the - symbol and anything after that in the following fasta sequence headers in the gene 1 alignment. I have ~500 genes like this to do the same thing. I could get this done only for one gene alignment but I need some help to reiterate this to ~500 alignments. I prefer .FNA alignments without the number, as new output files or changing the original file is fine too. Can someone help me to figure this out? I would appreciate an explanation on what each symbol does, so that I can learn. Sorry for the bad format in my example alignment.

Thanks!

gene 1

> P_dilatata-COMP100028
ACTGTCTTG
> P_limo-COMP100028
ACTGTCTTC
>P_leuco-COMP100028
ACTGTCTTA

I tried following, this worked for a single file

sed '/>/ s/\(.*\)-.*$/\1/g' test.FNA

This loop didn't work, and keeps running.

for filename in *.FNA; do
   sed '/>/ s/\(.*\)-.*$/\1/g';
done
bash sed loop fasta • 126 views
ADD COMMENTlink modified 4 weeks ago by JC9.1k • written 4 weeks ago by Badh20
4
gravatar for JC
4 weeks ago by
JC9.1k
Mexico
JC9.1k wrote:

You need to use the for cycle like this:

for filename in *.FNA; do
   sed '/>/ s/\(.*\)-.*$/\1/g' $filename > ${filename%.FNA}_new.FNA
done

What are you doing is iterating over all *.FNA files, each time you save the file name in the filename variable, so when you exec sed, just use the current value of the variable and save the output as a new file.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by JC9.1k
1

One can even simplify the sed command like this:

sed '/>/ s/-.*//g'

Which means:

  • in every line that contains a > (/>/)
  • substitute (s/
  • a - followed by zero or more character (-.*)
  • with nothing (//)
  • and take as much characters as possible ( g)
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by finswimmer12k

Thank you very much for all the solutions JC and Dave. I tried the first one and it worked perfectly!!

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Badh20
3
gravatar for Dave Carlson
4 weeks ago by
Dave Carlson290
Stony Brook University, NY
Dave Carlson290 wrote:

Your loop doesn't supply sed with a file to modify. This should work:

for filename in *.FNA; do
   sed '/>/ s/\(.*\)-.*$/\1/g' $filename;
done
ADD COMMENTlink written 4 weeks ago by Dave Carlson290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1418 users visited in the last hour