I use bwa-mem and samtools mpileup to align my species to a reference sequence for a few genes. My problem is that, first, the reference species is somewhat distantly related to the rest of the order I am studying. However, my real problem is that after running bwa-mem and gathering consensus sequences for each species after generating mpileup files and aligning all my sequences into one multiple sequence alignment - there are regions along the gene that contain gaps where maybe 8-15 out of 50 species had some reads align to that region. When I look at that region and for what species did get reads to align, it's very variable (in relation to the reference and to each other). I think this is why I cannot get anything to align, because flanking these regions are fairly conserved regions that I can get about 40-50 species to have mapped reads.
So my main question is - is there a way to set up the reference file so that I can somehow improve these variable regions to map. For example I tried inserting N's into these regions in the reference file hoping that it would allow more bases to align - I even tried leaving these areas as gaps hoping that would make it less stringent (a little bit de novo maybe?).
Just wondering if anyone knew of a way to improve the mapping for variable regions or when the reference species is distantly related. Thank you!