I am pretty new to genome assembly and in particular to mira3 and i have couple of questions regarding that.
What exactly is the templates size in mira3. I couldn't find a proper definition of the same in the manual. My fragment size before ligating adapters is ~250bp and after library construction it was found to be ~350. My read length is 260bp. In this scenario what is the exact template size they wanted in the configuration file?
I am working with bacterial miseq genome data and when i used mira3 to assembly the contigs, i found that there are ~1400 contigs in the final results file. Do you think are they too many? Does this has happened because of my wrong template size specification in the configuration file during mira3 assembly run?
The template size usually refers to the distance between the 5' ends of paired end data, in other words the length of the DNA between the adaptors. So in your case it would be ~250 bp.
I've never used mira for Illumina data so I can't comment on whether 1400 contigs is good or not, obviously it also depends on how repetitive the genome your trying to sequence is. Another thing to consider is preprocessing your data. From your description, your read length is as long (or longer) than the template length, which means that you may be sequencing into the adaptor sequence in your reads. This will cause serious problems for the genome assembler as many of the reads will end in the same DNA sequence, which doesn't even originate from your genome! I would suggest a tool like SeqPrep, which trims adaptors and filters reads based on quality. After preprocessing you could give assembly another go and see if you get better results. Finally it never hurts to get a second opinion, there are heaps of de novo assemblers out there for illumina data, you could try one and see if you get better results. I would suggest spades, ray, or velvet