I have two different question which may sound similar, but I think are different:
How to do genome guided denovo assembly? How to align the denovo assembled bam to a particular reference genome?
Explanation of problem in detail: I have aligned the fastq reads (100 bp - PE) for several samples from different individuals of different population to a reference genome using BWA mem.
The initial alignment percentage was good (about 80 - 90 %) read alignment. I then did a hard filtering on the aligned reads by removing the reads from the regions:
- with high coverage (above 97.5th percentile coverage distribution)
- reads where mates are mapped to different chromosome
- reads that contain hets sites for all the samples
- reads that have low mapQ
- and several other stringent filtering
This filtered about 30-40 % of the aligned reads from each sample. Now, I am thinking if I can
- take this filtered reads (as bam) and convert them to fastq
- and run a denovo assembly of these fastq (genome guided/unguided ??)
- or merge/align the denovo assembled bam to a reference
I think this should be able to take care of paralogous alignment, identifying big InDels, chromosomal changes to certain extent. Any one tried this? with which software might this be possible?