Question: Soft-Clipped Vs Unmapped?
gravatar for Bioscientist
7.9 years ago by
Bioscientist1.7k wrote:

quite confused about this two terminology. I'm reading Pindel, the split-read algorithm. The author seems to make use of the information of "unmapped" reads. Also there are other split-read-based algorithm, which uses "soft-clipped" reads, which are the unaligned parts of reads.

In my eyes, the two look quite similar. Say we have a 100bp read, 50bp of which cannot map while the 50bp can. Then how would BWA categorize this read? Will BWA think this is "unmapped" read since 50bp cannot be mapped; or it's "mapped" but with 50bp "soft-clipped" sequences?

Or BWA has a scoring system for mapping, which sets a threshold for distinguishing the two?


edit: maybe this is related to "centeredness"? say, if breakpoint locates at 99:1; then this 99bp will be mapped with 1bp as "soft-clipped" sequences. But for 50:50, then BWA may regard it as "unmapped"

bwa • 9.5k views
ADD COMMENTlink modified 22 months ago by fatima.m.zare20 • written 7.9 years ago by Bioscientist1.7k
gravatar for harremsis
7.3 years ago by
harremsis30 wrote:

I'm not an expert on read mapping and am also still trying to get to grips with it. But from my experience there are cases in which BWA reports extensively soft-clipped reads as matches. Here's an example from a paired end Illumina sequencing project:

CTCAG_6_1205_14418_171577_2     163     gi|261748867|gb|CM000804.1|     25090342        17      61S20M  =       25090377        116     TGCAGCCCCGCTTTGGTGAAAAAACAAGATAGGAACTGTTGTTGTTCAACTGTACTGTCACCTGCAGCACACACAACCTCC       bbbeeeeegggggiiighhiiiiiiiiiiihiifhiiiiiihiihhhihihihiiiggggggeeeeedddcdccccccccc       RG:Z:FCC0ACBACXX_L6_4   XT:A:M  NM:i:0  SM:i:17 AM:i:17 XM:i:0  XO:i:0  XG:i:0  MD:Z:20

As you can see in the CIGAR string 61S20M 61bp have been soft-clipped from the beginning of the read. The flag 163 (=128+32+2+1) indicates that the read was mapped (4th, i.e. "unmapped", bit is 0), paired, mapped in proper pair, second in pair and that its mate mapped to the reverse strand (check out this great site for decoding SAM bit flags).

So it seems that even with >50% soft-clipping BWA reports reads as mapped. So far I could not figure out how to tell BWA not to do that...which I would actually prefer.

ADD COMMENTlink written 7.3 years ago by harremsis30

The mapping quality (5th field) is only 17, which equates to a 0.01995262% chance the mapping is incorrect which is quite high when you are mapping millions of reads.

ADD REPLYlink written 5.6 years ago by Aaron Statham1.1k
gravatar for Geparada
7.9 years ago by
Geparada1.4k wrote:

As I understand the terminology, It will be "mapped" but with 50bp "soft-clipped" sequences. The unmapped have no sequences mapped to the target query.

ADD COMMENTlink written 7.9 years ago by Geparada1.4k

I'm just curious how BWA works. the read can still be considered "mapped" even with half of the length cannot be mapped?

ADD REPLYlink written 7.9 years ago by Bioscientist1.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1678 users visited in the last hour