Question: Removing duplicate protein sequences from fasta file
0
gravatar for saadleeshehreen
8 months ago by
saadleeshehreen60 wrote:

Hi, I want to do psiblast in command line. Before, performing psiblast, I tried to make a blast database by using makeblastdb command. But it shows the following error:

BLAST Database creation error: Error: Duplicate seq_ids are found:
REF|WP_003261842.1

My fasta file is very large. So, manual deletion of the duplicate sequence is nearly impossible. How can I get rid from the duplication?

Cheers

blast database makeblastdb • 304 views
ADD COMMENTlink modified 8 months ago by finswimmer11k • written 8 months ago by saadleeshehreen60
2
gravatar for finswimmer
8 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

Use seqkit:

$ seqkit rmdup input.fa > output.fa
ADD COMMENTlink modified 8 months ago • written 8 months ago by finswimmer11k

Thanks a lot! It works.

ADD REPLYlink written 8 months ago by saadleeshehreen60

Then please accept his answer. There's a little tick box.

ADD REPLYlink written 8 months ago by Emily_Ensembl18k

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

Upvote|Bookmark|Accept

ADD REPLYlink modified 8 months ago • written 8 months ago by RamRS21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1929 users visited in the last hour