Question: Double soft-clipped reads and how to romeve them
gravatar for Carles Borreda
2.9 years ago by
IVIA, Valencia, Spain
Carles Borreda0 wrote:


I have some reads aligned with bwa 0.7.15, and I've found some regions with reads ending with soft clipped bases in both ends (example CIGAR string 14S58M28S). I know that soft clipped reads could have a biological meaning, but in this case it seems more like a mapping error to me (only a part of the read has been mapped, while both ends have not). However, it is a primary alignment and the mapping quality seems decent (MAPQ=32), even if it's not exceptionally high.

I have two questions here:

-Am I right with my assumption of these being mapping errors? Is there any other explanation (biological or computational) for this events to happen?

-Is there any tool available for getting rid of double-soft-clipped reads? I could probably grep those lines out from the SAM file, but it will just get too long for applying it to many files.

Thank you all

next-gen alignment • 1.2k views
ADD COMMENTlink modified 2.9 years ago by Pierre Lindenbaum131k • written 2.9 years ago by Carles Borreda0

I don't think this is a good enough reason to second guess the aligner and it sounds quite arbitrary to me. I would leave those reads in.

You also didn't tell us about your application, so that makes it harder to judge.

ADD REPLYlink written 2.9 years ago by WouterDeCoster44k

Well I am looking for structural variations in several genomes (Paired End Illumina) and I am trying to validate my pipeline against parent-child pairs and sequencing replicates (which should in principle harbor little to none SVs, specially the sequencing replicates). It seems that there is some noise (considering as noise all SVs present in only one of the sequencing replicates) associated to this reads. In those cases, specific regions have a bunch of reads with their pairs mapped in different chromosomes (many of them for each region) and with this double-soft-clipping. That was what made me think about this reads, and I could not get any biological explanation for the double soft-clipping by myself. Simple soft-clipping are possible breakpoints for the SVs, but two SVs that close in the genome, pointing to several different chromosomes, didn't seem right for me.

ADD REPLYlink written 2.9 years ago by Carles Borreda0
gravatar for Devon Ryan
2.9 years ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:
  1. No, that can certainly happen, especially if you happen to have some smaller fragments being sequenced or the species you're aligning against doesn't well match what you sequenced.
  2. That's simple enough with a couple lines of pysam (and I suspect one could do it with jvarkit), but the better question is, "why bother?"
ADD COMMENTlink written 2.9 years ago by Devon Ryan97k
gravatar for Pierre Lindenbaum
2.9 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

using samjdk:

java -jar dist/samjdk.jar -e 'if(record.getReadUnmappedFlag()) return true;final Cigar c=record.getCigar();if(c==null || c.numCigarElements()<2) return true; return !(c.getFirstCigarElement().getOperator().isClipping() && c.getLastCigarElement().getOperator().isClipping()) ;' input.bam
ADD COMMENTlink written 2.9 years ago by Pierre Lindenbaum131k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1385 users visited in the last hour