Criteria for classifying circular sequences
6 weeks ago
I've recently started working with circular genomes from GenBank. I'd like to establish criteria for labeling a sequence as "circular," independent of the original GenBank classification. I will have labeled genes as reference.

Option 1: All expected genes must be present to classify a circular genome as "circular". Missing gene/s would indicate an incomplete genome, which should be interpreted as "linear".

Option 2: All expected genes must be present AND COMPLETE. The "circular" label is applied only if the genome is considered complete. This can be problematic if the origin occurs within a gene, effectively slicing it in half. I would have to demonstrate that central residues are likely missing, and labeling a sequence linear with gene segments on either end seems unintuitive.

Option 3: Always label the sequence as "circular" even if it's a partial genome. This wouldn't make much sense for singular genes or short reads.

Option 4: Keep the original classification. GenBank submission accuracy can be dubious, so I'd prefer a more personalized treatment.

6 weeks ago
I read this twice and still am not sure what your question is, or if there is one.

When a genome is labeled circular, it means complete genome that also happens to be circular. Plasmids are also labeled circular even though they are extrachromosomal elements, and they would be considered complete if labeled circular. A genome can be complete and be labeled linear because many organisms have linear chromosomes. This is to say that linear doesn't automatically mean incomplete. However, if a genome that is expected to be circular is labeled linear, it most likely is incomplete.