I'm new to genomics and bioinformatics and was hoping I could get insight from seasoned bioinformaticians for some advice.
My lab has sequenced the genomes of several dozen strains of Bacillus subtilis and we've gotten the whole genome short reads (paired end, 150bp, llumina hiseq) back for several dozen strains. We now need to assemble their genomes. After that we want to do a comparative genomic analysis of them: compare gene content, function, differential regimes of positive selection on genes, shared/unique genes, known/novel genes responsible for ecological adaptations in nature, etc
Our lab has done comparative studies on these strains before based on single or triple housekeeping phylogenetic analyses, but this is our first time getting our hands on their whole genome sequences - and I don't have experience doing genome assembly at all.
There's plenty of B. subtilis reference strains available so I'm guessing a reference guided assembly would be our safest bet (I'm not sure why we would want to do de novo assembly if we have references available). I'm not familiar with the software or reference guided assembly pipelines out there. Do you guys have any suggestions for software or pipeline/approaches we should use to assemble our genomes?
Excited to join the field, thanks!