I have paired-end NGS data of a fruitfly population, and I am trying to detect inversions according to paired-end insert size under the premise that a much larger insert size will be observed in presence of inversions. ( A break point between pair-end reads will increase the insert size when aligned to reference genome)
But I find the situation is more much complicated that I expected. The reads can be mapped in different ways, eg. supplementary alignment or chimeric reads... I also noticed the sam flag(second column in sam files) provides such information, but I am not clear how to filter reads according to these flags.
My question is: in order to infer inversions based on insert size, how should I filter reads?
Thanks in advance!