I want to align reads from a non-model microbat genome to the repeat-masked version of the published microbat (Myotis lucifugus) genome and do variant calling. I have short insert paired-end data generated on an Illumina Nextseq. The Myotis lucifugus genome has fairly gappy scaffolds and is of course a different, albeit closely related species.
Is more appropriate for me to align my reads as single-end or paired-end reads? BWA and other similar aligners penalize unpaired reads heavily by default. My concern is that I will have reads thrown out because their pair either falls within a masked repetitive region or in a gap of N's in the scaffold.
As a side note, when I run BWA with defaults, I get radically different mapping percentages depending on whether I align the reads as single-ended (~60% mapping) or paired-end (~85% mapping). Is this because BWA is penalizing the single-ended reads for not having a mate pair? Would I fix this by reducing the
-U penalty for an unpaired read pair from its default of 17 to 0?
Sorry that this is a few questions bundled together. This is also my first time posting here and I apologize if I've missed a rule.