Edit or replace certain character in fasta header from a large fasta sequence file
1
0
Entering edit mode
2.2 years ago
Pratanu ▴ 10

Hi all, I have 2.5 million covid-19 sequence file and i want to edit or replace certain character in fasta header for those sequences.

can anyone please give some type of algorithm to do so?

Thank You

python R • 916 views
ADD COMMENT
0
Entering edit mode
2.2 years ago
Wayne ★ 2.0k

You put this in Python or R. Most likely though using sed would be among the fastest. The find and replace syntax is spelled out here.

sed -i -e 's/abc/XYZ/g' file.txt

Though recently I saw a recommendation for sd for the find and replace aspect. (sed can do more than just that.) I haven't checked out sd myself yet. It has examples there.

ADD COMMENT

Login before adding your answer.

Traffic: 3832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6