Question

Edit or replace certain character in fasta header from a large fasta sequence file

0

Entering edit mode

2.2 years ago

Pratanu ▴ 10

Hi all, I have 2.5 million covid-19 sequence file and i want to edit or replace certain character in fasta header for those sequences.

can anyone please give some type of algorithm to do so?

Thank You

python R • 916 views

ADD COMMENT • link updated 2.2 years ago by Wayne ★ 2.0k • written 2.2 years ago by Pratanu ▴ 10

score 0 · Answer 1 · 2022-01-27

You put this in Python or R. Most likely though using sed would be among the fastest. The find and replace syntax is spelled out here.

sed -i -e 's/abc/XYZ/g' file.txt

Though recently I saw a recommendation for sd for the find and replace aspect. (sed can do more than just that.) I haven't checked out sd myself yet. It has examples there.