I have a question regarding unmapped reads. From SRBreak paper: "If reads are aligned across breakpoints then some parts of them cannot be mapped the first time. These parts are denoted by the ‘S’ character in the CIGAR strings of these reads". 'S' shows Soft Clipping; the clipped nucleotides are present in the read. I can find the number of 'S' character in Cigar. Does anybody know how can I use split reads and align them to a reference genome again?
I have written a tool to do exact this. gridss.SoftClipsToSplitReads extracts the clipped bases and repeatedly realigns them to the reference with the aligner of your choice (default of bwa). The latest development version (currently undergoing internal testing) also support realignment of existing split reads (e.g. if you don't like bwa SA split read alignment) as well as the entire read (which I use as a validation that my assembly contigs actually originate from where I expect them to).