I have a set of BAM files (pair-end reads) from which I want to identify reads having more than one mutation. I already have the mutation calls for the same data. I am thinking of the following steps to achieve this.
- Identify pairs of "nearby" mutations (based on a maximum distance, say 400-500).
- Construct a BED file from above containing chromosome, start and end information, where 'start' will be the first and 'end' will be the second of above pair.
- Find intersecting regions between above BED and BAM file to filter putative reads.
- Query each read for presence of mutation pair
I have two questions. Firstly, I am wondering if I have to somehow take the strand information into account while finding intersected regions (I don't have the strand information in the variant-calling file)? Secondly, is there an easier alternative that I may use for this analysis? Apologies if some of the info is not-correct/too-basic as I am newbie to this kind of analysis.