I am working with high coverage, paired-end sequencing reads of a bacterial genome with a DNA transposon hopping around inside the genome.
I've used BWA mem to align my reads to a reference genome made by concatenating the bacterial and transposon reference genomes into one reference. I now want to pull reads in which the transposon is inserting into itself. This self-insertion case would result in a read that is mapping to a random portion of the tranposon genome that suddenly maps to the beginning on the transposon genome, or vice versa.
From what I can tell, there isn't a flag that allows you to pull these reads specifically, as they are all mapping to the same reference. I'm quite new to bioinformatics though, so I wouldn't be surprised if there was something I was overlooking with the flags. I've also tried looking into CIGAR string portion of the sam file, but I haven't had any success looking there either.
Any suggestions would be greatly appreciated. Thanks!
Just a comment: for transposon self-insertion, couldn't you align to the transposon reference by itself, and then pull out reads that map discordantly? (i.e. those without the expected strand orientation and insert size?) DNA fragments from your experiment that cross self-insertion sites would have discordant properties when mapped back to a single transposon reference - wouldn't they?
I was just mapping to both genomes because I have other data I'm extracting from the reads. I can try mapping to just the transposon though and look for those discordant reads. Thanks for the suggestion.
a naive method: i would extract the clipped sections and the insertions from the reads mapping the transposon ref and see if those sub-sequences does not align with the bacteria ref but do align with the transposon ref.
(I think that will not be easy for a beginner)
I want to give this a try, but am not sure how to extract the clipped sections of the reads. Do you know of any material I can reference to accomplish that? Thank you for the response.