Question: Mapping quality score of clipped reads
0
gravatar for radwa.raed
13 months ago by
radwa.raed10
radwa.raed10 wrote:

Hi, I have a question please. For soft-clipped or hard-clipped reads, for example a read with a cigar string 100M50S, does the mapping quality score of the read reflect: 1. Mapping quality of both the 100 bases, as well as the 50 clipped bases? 2. Mapping quality of the 100 bases only?

If it is the combined quality of both the 100 bases + 50 bases, how do I extract the mapping quality of only the 100 matched bases in pysam? Thanks!

pysam bam python • 234 views
ADD COMMENTlink modified 13 months ago by swbarnes29.2k • written 13 months ago by radwa.raed10
0
gravatar for swbarnes2
13 months ago by
swbarnes29.2k
United States
swbarnes29.2k wrote:

The mapping score of the read as a whole has little to do with the quality of the individual letters. It's a measure of how likely it is that the read is aligned to its real position in the genome. It's more a measure of uniqueness of that sequence in your genome than anything else, though if there are many mismatches between the read and the reference that can also make the software question if it put the read in the right place.

ADD COMMENTlink written 13 months ago by swbarnes29.2k

The uniqueness of the 100 nucleotides that matched, or the uniqueness of the 100 nucleotides that matched at position X PLUS the uniqueness of the 50 nucleotides that were soft-clipped and matched at position Y?

ADD REPLYlink written 13 months ago by radwa.raed10

Since the 50 clipped bases are not part of the alignment, they obviously do not count. Why do you think that the 50 base match anywhere? If they were adapter, they would not belong to the reference genome at all.

ADD REPLYlink written 13 months ago by swbarnes29.2k

Thank you! I know that the clipped bases align elsewhere because it is a chimeric read. Below is a supplementary alignment for one of my reads with a mapping quality score of 43.This should map to chr2: 29446797 in hg19

03012p9SX:030B 2064 1 29446797 43 36M241H -1 -1 36 CCCCACGTGAACGAGGGAGGGAGGGAGGGTTGGGTG array('B', [32, 37, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41, 41, 41, 41, 41, 41, 41, 37, 41, 41]) [('SA', 'chr2,42479618,+,238M39S,60,2;'), ('MD', '26A9'), ('RG', 'SAMPLE.sample23'), ('NM', 1), ('AS', 31), ('XS', 23), ('fm', '3_0')]

Since the read is on the negative strand, below is the forward sequence: CACCCAACCCTCCCTCCCTCCCTCGTTCACGTGGGG

However this sequence does not map to chr2: 29446797. I tried blasting but I think it is too short so I aligned it to the genomic sequence as you can see here https://ibb.co/R7VHrQH This is not a good alignment at all. Is BWA wrong? I thought perhaps if the alignment score is a total of both the matched and clipped bases, it could explain the issue. Thank you very much!

ADD REPLYlink written 13 months ago by radwa.raed10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1179 users visited in the last hour