Question

BiG-SCAPE total number of genomes issue

0

Entering edit mode

21 months ago

allenmbrooke • 0

I'm trying to run BiG-SCAPE (1.1.5) on antismash (7.0.1) output from 40 fungal genomes but I'm running into a problem where it's counting every antismash NODE.gbk file within the different genome directories as genomes themselves. So instead of 40 genomes with ~30 node.gbk files each, it's reading it as 1200 genomes and 1200 node files and I can't figure out why. I was hoping someone could explain or show me an example of how you organize the antismash output to be input files for BiG-SCAPE.

Right now I have 40 directories named only with an ID number like "106" for example, and within those are the antismash output files "106_spades_scaffolds.gbk", "106_spades_scaffolds.json", "106_spades_scaffolds.zip" and multiple "106_NODE_210_length_52287_cov_51.201034.region001.gbk" files. I have those NODE.gbk files from all 40 genomes in one directory and that is the input for bigscape, but there has to be something missing that helps it know how many genomes there are.

I found one tutorial where there are both genomes and biosynthetic gene clusters (BGCs) information stored in gbk files and used as the input, but it doesn't mention the genome gbks in the "How to compile your own input dataset" section and I don't know where the genome gbk files come from or if I would need to generate those myself from my original fasta files (these genomes are not on genbank).

genomics • 622 views

ADD COMMENT • link 21 months ago by allenmbrooke • 0