Couple of months back, I sequenced (MiSeq) few BACs and assembled (Paired-end reads) using SOAPdenovo, but my assembly was fallen into many scaffolds. Now, I got reference genome of the same cultivar and trying to pull out my interested region (about 3.2 Mb).
Here is first approach:
I mapped my Paired-end reads on whole genome using BWA. By this approach, only 19 scaffolds of whole genome got mapped.
I blasted (blastn) SOAPdenovo assembly with whole genome (evalue: 1000, word size 40, percentage similarity: 100%). In this approach, more than 1500 scaffolds of whole genome got blast hits.
My question is, why this variation? Any problem with my mapping? Which is the best approach?
Or any other approach? Please share your experience guys!
EDIT: I am also thinking about reference guided re-assembly of my Paired-ends.
Duplicate of Better statergy for Gap Closing
Not exactly. Here, my question is how to subset sequences? Not to close gaps?
I hope, you agree with me
If I were you, I would try out a few different assemblers. For example IDBA-Hybrid sounds like the perfect match for your problem. Before making it you might want to edit src/sequence/short_sequence.h for longer read length.
Thank you! I will try this out.