Entering edit mode
2.3 years ago
rupjandu
•
0
Hello everyone, I am using the web-based Galaxy tool, not the command line version. I merged FASTA files into one and I'm trying to construct a BLAST database with these local sequences through the makeblastdb function. I get an error message that reads, "Error: Duplicate seq_ids are found: GNL|BL_ORD_ID|9650923".
Can anyone assist in finding a way to remove the duplicate seq IDs using the web-based Galaxy tool preferentially?
Thank you!
If you need Galaxy specific assistance please post this on their help forum: https://help.galaxyproject.org/
sed -nr '/^>/p' <input.fa> |sort -V | uniq -D | uniq -c
on download file (input.fa
). This should print duplicated/identical headers and their count.seqkit
tool and runseqkit rename -n <input.fa> -o <output.fa>
. This would generate a new fileoutput.fa
and append numbers serially at the end of fasta IDs/headers if they are identical.sed -nr '/^>/p' <output.fa> | sort -V | uniq -D | uniq -c
on new file (output.fa
). This should not print any line.