Question: Merging all sequences with identical ID's
0
gravatar for hugo.swenson
3 months ago by
hugo.swenson0 wrote:

Hi!

I am having issues with multiple genes (fasta files) which i am supposed to concatenate. My issue lies in that all these genes have identical taxon-identifiers, meaning that after concatenating my aligned + trimmed files, i end up with multiple duplicate headers in the combined file. What i am wondering is if there is any method, preferably in python, to merge all sequences with a identical header into one sequence (ie. remove the duplicate header entries, and then merge all sequences matching that header into one sequence?

sequence • 173 views
ADD COMMENTlink modified 3 months ago by thackl2.6k • written 3 months ago by hugo.swenson0

please provide example.

ADD REPLYlink written 3 months ago by shenwei3564.5k

Ha, just realized, I recommended your tool :)

ADD REPLYlink written 3 months ago by thackl2.6k
1
gravatar for thackl
3 months ago by
thackl2.6k
MIT
thackl2.6k wrote:

seqkit concat might do what you want: "concatenate sequences with same ID from multiple files"

https://github.com/shenwei356/seqkit

ADD COMMENTlink written 3 months ago by thackl2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1982 users visited in the last hour