I performed de novo assembly for a cosmid sequenced on NextSeq PE 300 using SPADES. The pipeline i used is as follows:
1.Trim the sequence to remove low quality bases 2. Extract a subset of reads 3. Perform SPADES de novo assembly.
The expected length of cosmid was 50Kb while I got a sequence length of around 47.5kb. This cosmid contained an overlapping region with another cosmid and the overlapping sequence was PCR amplified and sequenced confirming its presence.
The length of the overlapping sequence is 990bp and it is not present in the assembled sequence.
I have looked through the contigs.fasta file obtained from the SPADES output and this sequence is not present in other contigs as well.
What approach should I use to search for this missing sequence in the raw data or the assembled data? How can I justify the absence of this sequence from the assembled genome?