rename the contents of a file
1
0
Entering edit mode
2.7 years ago

I extracted genes from prokka .ffn file. I found copies of gene sequences, but the ID differs. I wanted to know how I could change the ID of each of the ???

example I did to replace (.fasta_*_) with (_) :

>2549870-Q2398_S6.fasta_00234_Adenylate_kinase


sed -i 's:.fasta_*******_:_:g'

>2549870-Q2398_S6_Adenylate_kinase


my question: with which character do I replace the stars for you to recognize any number?

sequencing prokka • 465 views
0
Entering edit mode
2.7 years ago
Joe 20k

You can replace the * with [0-9] which is a character class for digits only.

Depending on whether there are always 5 or not, you might want to use [0-9]{5} or perhaps even just any number of digits between those underscores ([0-9]+). We don't know how consistent your files are going to be though so its impossible to say which you need as the right balance of strictness versus 'sensitivity'.