Question: Removing the last part of fasta header in many alignmnet files
0
gravatar for Badh2
7 months ago by
Badh20
USA
Badh20 wrote:

Hello, I'm trying to remove the - symbol and anything after that in the following fasta sequence headers in the gene 1 alignment. I have ~500 genes like this to do the same thing. I could get this done only for one gene alignment but I need some help to reiterate this to ~500 alignments. I prefer .FNA alignments without the number, as new output files or changing the original file is fine too. Can someone help me to figure this out? I would appreciate an explanation on what each symbol does, so that I can learn. Sorry for the bad format in my example alignment.

Thanks!

gene 1

> P_dilatata-COMP100028
ACTGTCTTG
> P_limo-COMP100028
ACTGTCTTC
>P_leuco-COMP100028
ACTGTCTTA

I tried following, this worked for a single file

sed '/>/ s/\(.*\)-.*$/\1/g' test.FNA

This loop didn't work, and keeps running.

for filename in *.FNA; do
   sed '/>/ s/\(.*\)-.*$/\1/g';
done
bash sed loop fasta • 217 views
ADD COMMENTlink modified 7 months ago by JC10k • written 7 months ago by Badh20
4
gravatar for JC
7 months ago by
JC10k
Mexico
JC10k wrote:

You need to use the for cycle like this:

for filename in *.FNA; do
   sed '/>/ s/\(.*\)-.*$/\1/g' $filename > ${filename%.FNA}_new.FNA
done

What are you doing is iterating over all *.FNA files, each time you save the file name in the filename variable, so when you exec sed, just use the current value of the variable and save the output as a new file.

ADD COMMENTlink modified 7 months ago • written 7 months ago by JC10k
1

One can even simplify the sed command like this:

sed '/>/ s/-.*//g'

Which means:

  • in every line that contains a > (/>/)
  • substitute (s/
  • a - followed by zero or more character (-.*)
  • with nothing (//)
  • and take as much characters as possible ( g)
ADD REPLYlink modified 7 months ago • written 7 months ago by finswimmer13k

Thank you very much for all the solutions JC and Dave. I tried the first one and it worked perfectly!!

ADD REPLYlink modified 7 months ago • written 7 months ago by Badh20
3
gravatar for Dave Carlson
7 months ago by
Dave Carlson320
Stony Brook University, NY
Dave Carlson320 wrote:

Your loop doesn't supply sed with a file to modify. This should work:

for filename in *.FNA; do
   sed '/>/ s/\(.*\)-.*$/\1/g' $filename;
done
ADD COMMENTlink written 7 months ago by Dave Carlson320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1691 users visited in the last hour