Question: Merging all sequences with identical ID's
0
gravatar for hugo.swenson
13 months ago by
hugo.swenson0 wrote:

Hi!

I am having issues with multiple genes (fasta files) which i am supposed to concatenate. My issue lies in that all these genes have identical taxon-identifiers, meaning that after concatenating my aligned + trimmed files, i end up with multiple duplicate headers in the combined file. What i am wondering is if there is any method, preferably in python, to merge all sequences with a identical header into one sequence (ie. remove the duplicate header entries, and then merge all sequences matching that header into one sequence?

sequence • 644 views
ADD COMMENTlink modified 13 months ago by thackl2.7k • written 13 months ago by hugo.swenson0

please provide example.

ADD REPLYlink written 13 months ago by shenwei3565.0k

Ha, just realized, I recommended your tool :)

ADD REPLYlink written 13 months ago by thackl2.7k
1
gravatar for thackl
13 months ago by
thackl2.7k
MIT
thackl2.7k wrote:

seqkit concat might do what you want: "concatenate sequences with same ID from multiple files"

https://github.com/shenwei356/seqkit

ADD COMMENTlink written 13 months ago by thackl2.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1550 users visited in the last hour