Reads mapped with zero quality still mapped to exactly one position
1
0
Entering edit mode
3.7 years ago
bataloffv ▴ 20

Hi everybody! I'm new to bioinformatics and to this forum. I have Illumina paired-end whole genome sequencing data (read length is 150bp) aligned with bwa. There are regions where all reads are mapped with zero quality. I took all reads covering a specific coordinate from such region:

$ samtools view algn_srt.bam "chr21:43063074-43063074" | cut -f 1 | sort | uniq -c

This results in 28 reads:

  1 V300033420L1C001R0181033041
  1 V300033420L1C001R0300317239
  1 V300033420L1C001R0470415198
  1 V300033420L1C003R0011383109
  1 V300033420L1C003R0081312154
  2 V300033420L1C003R0130543193
  1 V300033420L1C003R0140107287
  1 V300033420L1C003R0170739418
  1 V300033420L1C003R0230130936
  1 V300033420L1C003R0231300468
  1 V300033420L1C003R0300977382
  3 V300033420L1C003R0371074884
  2 V300033420L1C003R0371074885
  1 V300033420L1C003R0510100518
  1 V300033420L1C003R0671044234
  1 V300033420L1C004R0210447304
  1 V300033420L1C004R0371389897
  1 V300033420L1C004R0380756132
  1 V300033420L1C004R0550261504
  1 V300033420L1C005R0090362628
  2 V300033420L1C005R0090658320
  1 V300033420L1C005R0160918804
  1 V300033420L1C005R0330628576
  1 V300033420L1C005R0391023819
  1 V300033420L1C006R0021214012
  1 V300033420L1C006R0060892638
  1 V300033420L1C006R0060892639
  1 V300033420L1C006R0161090838

Most reads cross the specified coordinate exactly once, but three reads cross it 2 times (I guess that's an indication of a possible deletion, because paired reads are displayed with their common base name), and there is one read which crosses the coordinate 3 times (how is this possible btw?).

Most reads aligned with CIGAR string 150M:

$ samtools view algn_srt.bam "chr21:43063074-43063074" | cut -f 6 | grep 150M | wc -l
20

Next I searched for other places where the reads are mapped:

$ samtools view algn_srt.bam | grep -f read_names.txt >> mq_0.sam
$ samtools view mq_0.sam | cut -f 1 | sort | uniq -c
  2 V300033420L1C001R0181033041
  2 V300033420L1C001R0300317239
  3 V300033420L1C001R0470415198
  2 V300033420L1C003R0011383109
  2 V300033420L1C003R0081312154
  2 V300033420L1C003R0130543193
  2 V300033420L1C003R0140107287
  2 V300033420L1C003R0170739418
  2 V300033420L1C003R0230130936
  2 V300033420L1C003R0231300468
  2 V300033420L1C003R0300977382
  3 V300033420L1C003R0371074884
  3 V300033420L1C003R0371074885
  2 V300033420L1C003R0510100518
  2 V300033420L1C003R0671044234
  2 V300033420L1C004R0210447304
  2 V300033420L1C004R0371389897
  2 V300033420L1C004R0380756132
  2 V300033420L1C004R0550261504
  2 V300033420L1C005R0090362628
  3 V300033420L1C005R0090658320
  2 V300033420L1C005R0160918804
  2 V300033420L1C005R0330628576
  2 V300033420L1C005R0391023819
  2 V300033420L1C006R0021214012
  2 V300033420L1C006R0060892638
  2 V300033420L1C006R0060892639
  2 V300033420L1C006R0161090838

Most reads are encountered two times, that's because their mates are displayed with the same name. Only two reads map to a distant place:

V300033420L1C003R0371074885 339 chr21   6454741 0   82H68M  =   43062949    36608142    GATCCACCCCAGTGATCTGCAGAGGGCGCGGCTTCAGGGCTCAAGGCCAGCAAAAGCCCCGCCTGGAC    GFFFGFFCDFGGEFFFFFFGEFGGGF>FFGFFFFFEGFGG>FGFFFFFFFFFFFFFFDFFCGFCFFGF    SA:Z:chr21,43063049,-,81M69S,0,2;   XA:Z:chr21,-43063063,82S68M,1;  MC:Z:150M   MD:Z:11A56  NM:i:1  AS:i:63 XS:i:63
V300033420L1C001R0470415198 81  chr21   6454910 0   150M    =   43063063    36608005    GGATGAGAAACCCAAGAACCCAGGCCCCAAGTATTAAAGCGGTAATGACCTAAATGTTCAGTGCAGGGCGCTCAGAAGTAAAGGCAGGTGCCTGGGAGGAACTGACCTCCCCGGGGTGCTTCTCGGAGCGGGGCCCCCAGGACAGTCCAG  E=FEFFGFDCFDGFDGF.GFGFG0F'FGGFGEFFFGFBAFEG6FEFFCFFFF?GFFF3GFFFGEFGFFGDEDFGFGGGGGFGGGGFFFGFGFFAFFFFFGGGGGFFFFFFFGGGFFGGF;FGFFGFFGFFGFFFFFFEFF

So the question is: why mapping quality of all these reads is zero, when only two of them have another location where they are mapped?

Thanks!

bwa mapping quality mapq mq • 982 views
ADD COMMENT
0
Entering edit mode

Can you also post the parameters used with bwa for the original alignments?

ADD REPLY
0
Entering edit mode

Bwa was piped after picard SamToFastq:

picard SamToFastq \
I=markilluminaadapters.bam \
FASTQ=/dev/stdout \
CLIPPING_ATTRIBUTE=XT CLIPPING_ACTION=2 INTERLEAVE=true NON_PF=true | \
bwa mem -M -t 4 -p /path/to/broad/hg38.fasta /dev/stdin > algn.sam
ADD REPLY
0
Entering edit mode

Look at the SAM flags (the second field from a SAM record), it could shed some light on this issue:

samtools view algn_srt.bam "chr21:43063074-43063074" | cut -f 2 | sort | uniq -c
ADD REPLY
0
Entering edit mode

The output:

  4 147
  1 161
 11 163
  1 339
  1 355
  9 83
  6 99
ADD REPLY
3
Entering edit mode
3.7 years ago

Most tools will not list all locations for multiply mapped reads that match identically in multiple locations.

Usually, one location is picked randomly and will be the one that gets reported with a mapping quality of zero.

You would only get multiple alignments present if these were secondary or chimeric alignments.

Depending on what tool you use it may list the alternative alignments in the optional tags.

ADD COMMENT
0
Entering edit mode

Thank you the answer. Looking at optional tags, I guess this is the indication of where the reads alternatively map?

XA:Z:chr21,-6454604,150M,0;
XA:Z:chr21,+6454616,150M,0;
XA:Z:chr21,+6454627,150M,4;
XA:Z:chr21,+6454627,150M,0;
XA:Z:chr21,+6454627,150M,1;
XA:Z:chr21,+6454627,150M,0;
XA:Z:chr21,-6454632,115M2D35M,2;
XA:Z:chr21,+6454637,150M,10;
XA:Z:chr21,+6454639,150M,1;
XA:Z:chr21,+6454647,150M,1;
XA:Z:chr21,-6454649,150M,3;
XA:Z:chr21,-6454664,150M,2;
XA:Z:chr21,-6454664,144M6S,2;
XA:Z:chr21,-6454674,150M,1;
XA:Z:chr21,+6454683,125M25S,2;
XA:Z:chr21,-6454692,150M,2;
XA:Z:chr21,+6454707,150M,2;
XA:Z:chr21,+6454713,150M,9;
XA:Z:chr21,+6454713,150M,1;
XA:Z:chr21,-6454719,150M,2;
SA:Z:chr21,43063063,+,87S63M,0,1;
XA:Z:chr21,+6454727,150M,1;
SA:Z:chr21,43063063,-,82S68M,0,2;
SA:Z:chr21,6454741,-,82S68M,0,1;
SA:Z:chr21,43063087,+,43M107S,0,2;
SA:Z:chr21,43063044,+,86M64S,0,2;
XA:Z:chr21,+6454741,29S121M,2;
XA:Z:chr21,-6454741,5S145M,4;
XA:Z:chr21,-6454741,15S135M,2;
SA:Z:chr21,43063049,-,81M69S,0,3;
XA:Z:chr21,-6454741,30S120M,2;
XA:Z:chr21,+6454742,150M,1;
XA:Z:chr21,+6454744,150M,3;
ADD REPLY
1
Entering edit mode

that looks like it indeed,

the XA tag indicates an aligner dependent tag, the bwa manual states:

XA  Alternative hits; format: (chr,pos,CIGAR,NM;)*

http://bio-bwa.sourceforge.net/bwa.shtml#4

the SA tag is specified in the SAM Specification as it is a standardized tag:

SA Z Other canonical alignments in a chimeric alignment

https://samtools.github.io/hts-specs/SAMtags.pdf

ADD REPLY

Login before adding your answer.

Traffic: 1957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6