Question: Double soft-clipped reads and how to romeve them
0
gravatar for Carles Borreda
2.2 years ago by
IVIA, Valencia, Spain
Carles Borreda0 wrote:

Hi

I have some reads aligned with bwa 0.7.15, and I've found some regions with reads ending with soft clipped bases in both ends (example CIGAR string 14S58M28S). I know that soft clipped reads could have a biological meaning, but in this case it seems more like a mapping error to me (only a part of the read has been mapped, while both ends have not). However, it is a primary alignment and the mapping quality seems decent (MAPQ=32), even if it's not exceptionally high.

I have two questions here:

-Am I right with my assumption of these being mapping errors? Is there any other explanation (biological or computational) for this events to happen?

-Is there any tool available for getting rid of double-soft-clipped reads? I could probably grep those lines out from the SAM file, but it will just get too long for applying it to many files.

Thank you all

next-gen alignment • 1.0k views
ADD COMMENTlink modified 2.2 years ago by Pierre Lindenbaum126k • written 2.2 years ago by Carles Borreda0

I don't think this is a good enough reason to second guess the aligner and it sounds quite arbitrary to me. I would leave those reads in.

You also didn't tell us about your application, so that makes it harder to judge.

ADD REPLYlink written 2.2 years ago by WouterDeCoster43k

Well I am looking for structural variations in several genomes (Paired End Illumina) and I am trying to validate my pipeline against parent-child pairs and sequencing replicates (which should in principle harbor little to none SVs, specially the sequencing replicates). It seems that there is some noise (considering as noise all SVs present in only one of the sequencing replicates) associated to this reads. In those cases, specific regions have a bunch of reads with their pairs mapped in different chromosomes (many of them for each region) and with this double-soft-clipping. That was what made me think about this reads, and I could not get any biological explanation for the double soft-clipping by myself. Simple soft-clipping are possible breakpoints for the SVs, but two SVs that close in the genome, pointing to several different chromosomes, didn't seem right for me.

ADD REPLYlink written 2.2 years ago by Carles Borreda0
1
gravatar for Devon Ryan
2.2 years ago by
Devon Ryan94k
Freiburg, Germany
Devon Ryan94k wrote:
  1. No, that can certainly happen, especially if you happen to have some smaller fragments being sequenced or the species you're aligning against doesn't well match what you sequenced.
  2. That's simple enough with a couple lines of pysam (and I suspect one could do it with jvarkit), but the better question is, "why bother?"
ADD COMMENTlink written 2.2 years ago by Devon Ryan94k
1
gravatar for Pierre Lindenbaum
2.2 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum126k wrote:

using samjdk: http://lindenb.github.io/jvarkit/SamJdk.html

java -jar dist/samjdk.jar -e 'if(record.getReadUnmappedFlag()) return true;final Cigar c=record.getCigar();if(c==null || c.numCigarElements()<2) return true; return !(c.getFirstCigarElement().getOperator().isClipping() && c.getLastCigarElement().getOperator().isClipping()) ;' input.bam
ADD COMMENTlink written 2.2 years ago by Pierre Lindenbaum126k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1585 users visited in the last hour