Question: how to remove asterisk characters from a translated sequences (fasta format)?
gravatar for seta
2.2 years ago by
seta920 wrote:

Hi everybody,

I used Transdecoder to translate the assembly transcriptome, there is asterisk characters (*) in the translated sequences indicating stop codon. I plan to use Interproscan on this assembly and * cause an error. Could you please let me know how I can remove these characters from fasta file? removing is the right or they have to replaced with stop codon, but which of them?! Thanks for any help

sequencing alignment assembly • 1.2k views
ADD COMMENTlink modified 2.1 years ago by Biostar ♦♦ 20 • written 2.2 years ago by seta920
sed -i 's/*//g' filename.fasta
ADD REPLYlink written 2.2 years ago by Prakki Rama2.0k

At first I thought you were trolling the question-poster, but it turns out that sed (at least as implemented in Cygwin) will interpret '*' as a literal asterisk. However, it might be safer to do

sed -i 's/\\*//g' filename.fasta

just to make it crystal clear to the interpreter to treat '*' as '*.

ADD REPLYlink written 2.1 years ago by Joseph Pearson360

Indeed, sed can be confusing if one doesn't escape things. Compare echo "fooo*{1}" | sed "s/o*//g", echo "fooo*{1}" | sed "s/o*{1}//g" and echo "fooo*{1}" | sed "s/*{1}//g".

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Devon Ryan73k

man tr

ADD REPLYlink written 2.2 years ago by Pierre Lindenbaum101k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 585 users visited in the last hour