I have around 1000 nucleotide sequences, each of which represents a bacterial gene cluster. Each sequence is around 30kb in length and they come from many different genera (some are even metagenomic sequences). Is there a convenient way to accurately annotate these gene clusters in batch form and retrieve a genbank/fasta file for each cluster?
I've had success using RAST, but only for one sequence at a time. I've also tried PROKKA, which works well for finding ORFs, but the annotation results in a lot of "hypothetical protein" results.