I have a handful of genes of interest that I am trying to identify in a plant transcriptome dataset.
I used bwa mem to map the paired-end transcriptome reads against a reference dataset generated from the genes of interest.
The gene sequences I used as leads are not the same organism as the transcriptome, so I am expecting mismatches. I got pretty good coverage for most of the gene sequences, but the reads seem to cluster in chunks, and there is a lot of soft clipping going on in between the chunks.
I am wondering how to start playing with bwa mem parameters to try and retrieve more of the pieces in between the chunks.
Which parameters are good to modify? Matching score, mismatch penalty, or gap penalties?
I appreciate any suggestions!