I wanted to check the no of contigs present in either a FASTA or GBK file, I am aware of algorithms such as CheckM that will allow for this process, however is there a direct code to check no of contigs in a sequence directly with python or biopython?
Easy in BioPython.
from Bio import SeqIO recs = list(SeqIO.parse('genbank.gbk', 'genbank')) len(recs)
This could be more memory efficient with an iterator, but this is a quick and easy way.
This is likely a more robust solution too, since
*nix solutions require that you know your files very well, such that they don't have any nasty surprises in them.