I have recently started working on genome assembly of a fungus genome.I have illumina short read sequencing paired-end (2x150bp) data taken from NCBI. Based on this data, I am trying to set up pipeline for genome assembly, which can later be used for our upcomming sequencing data.
Going through multiple litrature papers and tutorials, I made this workflow.
- FastQC data check and Data Trimming(if needed)
- De novo genome assembly using spades (as no reference genome is available) -> contigs.fasta
- contigs.fasta Quality check with QUAST and BUSCO
- RepeatMasking and RepeatModeling
- Annotation of assembly
As every tutorial just ends on these 4 steps, my queries are
- Spades gave ma a contigs.fasta file. Is their any method to make scaffolds from this (contigs.fasta) file. can this be done based n just the illumina short read data ?
- Is it necessary to turn contigs -> scaffolds if only short read data is available ? or the contigs.fasta can be used for further processing?
- Is repeatMasking and RepeatModeling are two different steps of one ?
- Is there anything or anyother analysis that should be done.
If you think these are naive questions, just know that I am new to genome assemblies. learning and trying to understand the steps which most of the tutorials/publications don't mention.