Hi, I have a bam file from WGS sequencing, wehre I try to identify the integration site(s) of a transgene insertion in the mouse genome. I have run the mapping of the paired-end samples with bwa against the indexed transgene sequence. The goal was to identify those reads which were mapped with only one read of the pair to the transgene (while the other read of the pair will map to the mouse genome) or such reads, which were soft-clipped, as only part of the read was mapped to the transgene.
I know that the flag
SA stands for such reads, but I am not sure how to extract such reads from the bam file. When I do
grep SA: file.bam do I extract both pairs or only the one which was mapped to the transgene.
Is it better to use samtools to extract reads with a specific flag. Are both reads of the pair being extracted then?
Is there a better way to identify such integration sites of exogenous sequences?