Hi everyone! I'm quite new in this field, so I need help because I don't understand whole pipeline for my task. My lab has sequenced (Illumina, paired-end) two strain of M.tuberculosis. One of them expected to be the control, second contains mutations which I have to find. These mutations could be snp, deletions or large translocations. I tried to assemble genomes de novo using SPAdes (within unicycler) but there are a lot of contigs and it's difficult to compare between. Now I began to think that I can use information about the M.tuberculosis genome from ncbi (my control strain should be very similar to that one). But I don't really understand, is it correct in this case? If so, then I should use reference guided assembly and provide m.tuberculosis as trusted contigs to SPAdes? Or should I just mapped my final contigs on reference genome? The second my thought was just to sequence my strains again but using nanopore to generate long reads and finalize assembly. Please, tell me which pipeline should I use in my case? How can I find differences without accidentally losing information during assembly? Thanks to all!
Hi ! Do not use a genome of reference as a trusted contig in spades it will probably insert some erros in your assemble. Try to do some In silico gap filling (here another awnser that may help you) to try to close your control genome. But if you have a good N50 you don't really need to close it or make a schaffold to find de novo mutations. You can do variant calling with contigs. Some times the processes of closing a genome in silico can insert some errors that are mistaken by de novo mutations, so do consider if you really need to close your genome.