Question: How can I split the soft clipped reads and map the splitted reads again.
gravatar for fatima.m.zare
24 months ago by
fatima.m.zare20 wrote:

I have a question regarding unmapped reads. From SRBreak paper: "If reads are aligned across breakpoints then some parts of them cannot be mapped the first time. These parts are denoted by the ‘S’ character in the CIGAR strings of these reads". 'S' shows Soft Clipping; the clipped nucleotides are present in the read. I can find the number of 'S' character in Cigar. Does anybody know how can I use split reads and align them to a reference genome again?

ADD COMMENTlink modified 24 months ago by d-cameron2.1k • written 24 months ago by fatima.m.zare20

duplicate of extracting the soft clipped seq only from a sam file

ADD REPLYlink written 24 months ago by Pierre Lindenbaum127k

I interpreted this as a slightly different question as the other question didn't cover the additional steps required to turn the soft clipped reads + alignments into a split read. You need to:

  • match the fragments back to their reads

  • drop unmapped fragments - these reads stay as soft clipped reads

  • rehydrate the sequence and quality scores of the originating read (or write a hard clip)

  • replace all the SAM flags, fields and tags with that of original soft clipped read except the alignment-specific ones such as RNAME, POS, CIGAR, and NM tag

  • set supplementary flag

  • write SA tags

  • merge the new supplementary reads back into the input file in their mapped position (they were extracted according to the position of the primary soft clipped alignment)

ADD REPLYlink modified 24 months ago • written 24 months ago by d-cameron2.1k
gravatar for d-cameron
24 months ago by
d-cameron2.1k wrote:

I have written a tool to do exact this. gridss.SoftClipsToSplitReads extracts the clipped bases and repeatedly realigns them to the reference with the aligner of your choice (default of bwa). The latest development version (currently undergoing internal testing) also support realignment of existing split reads (e.g. if you don't like bwa SA split read alignment) as well as the entire read (which I use as a validation that my assembly contigs actually originate from where I expect them to).

ADD COMMENTlink written 24 months ago by d-cameron2.1k

Thanks. Actually, I read your paper and your GitHub repository. I have bam file, reference fasta file. I want to realign the soft clipped bases of my bam file with BAW aligner. I think that I should use SoftClipsToSplitReads. Unfortunately, I don't know how should I do with your program. Could you please give me a straight way to do that?

ADD REPLYlink written 23 months ago by fatima.m.zare20

The simplest command-line looks like the following:

java -Xmx512M -cp gridss-VERSION-with-dependencies.jar gridss.SoftClipsToSplitReads I=your_input.bam O=your_output.bam REFERENCE_SEQUENCE=your_reference.fa

ADD REPLYlink written 23 months ago by d-cameron2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 870 users visited in the last hour