Question: Reads mapped with zero quality still mapped to exactly one position
0
gravatar for bataloffv
6 weeks ago by
bataloffv20
bataloffv20 wrote:

Hi everybody! I'm new to bioinformatics and to this forum. I have Illumina paired-end whole genome sequencing data (read length is 150bp) aligned with bwa. There are regions where all reads are mapped with zero quality. I took all reads covering a specific coordinate from such region:

$ samtools view algn_srt.bam "chr21:43063074-43063074" | cut -f 1 | sort | uniq -c

This results in 28 reads:

  1 V300033420L1C001R0181033041
  1 V300033420L1C001R0300317239
  1 V300033420L1C001R0470415198
  1 V300033420L1C003R0011383109
  1 V300033420L1C003R0081312154
  2 V300033420L1C003R0130543193
  1 V300033420L1C003R0140107287
  1 V300033420L1C003R0170739418
  1 V300033420L1C003R0230130936
  1 V300033420L1C003R0231300468
  1 V300033420L1C003R0300977382
  3 V300033420L1C003R0371074884
  2 V300033420L1C003R0371074885
  1 V300033420L1C003R0510100518
  1 V300033420L1C003R0671044234
  1 V300033420L1C004R0210447304
  1 V300033420L1C004R0371389897
  1 V300033420L1C004R0380756132
  1 V300033420L1C004R0550261504
  1 V300033420L1C005R0090362628
  2 V300033420L1C005R0090658320
  1 V300033420L1C005R0160918804
  1 V300033420L1C005R0330628576
  1 V300033420L1C005R0391023819
  1 V300033420L1C006R0021214012
  1 V300033420L1C006R0060892638
  1 V300033420L1C006R0060892639
  1 V300033420L1C006R0161090838

Most reads cross the specified coordinate exactly once, but three reads cross it 2 times (I guess that's an indication of a possible deletion, because paired reads are displayed with their common base name), and there is one read which crosses the coordinate 3 times (how is this possible btw?).

Most reads aligned with CIGAR string 150M:

$ samtools view algn_srt.bam "chr21:43063074-43063074" | cut -f 6 | grep 150M | wc -l
20

Next I searched for other places where the reads are mapped:

$ samtools view algn_srt.bam | grep -f read_names.txt >> mq_0.sam
$ samtools view mq_0.sam | cut -f 1 | sort | uniq -c
  2 V300033420L1C001R0181033041
  2 V300033420L1C001R0300317239
  3 V300033420L1C001R0470415198
  2 V300033420L1C003R0011383109
  2 V300033420L1C003R0081312154
  2 V300033420L1C003R0130543193
  2 V300033420L1C003R0140107287
  2 V300033420L1C003R0170739418
  2 V300033420L1C003R0230130936
  2 V300033420L1C003R0231300468
  2 V300033420L1C003R0300977382
  3 V300033420L1C003R0371074884
  3 V300033420L1C003R0371074885
  2 V300033420L1C003R0510100518
  2 V300033420L1C003R0671044234
  2 V300033420L1C004R0210447304
  2 V300033420L1C004R0371389897
  2 V300033420L1C004R0380756132
  2 V300033420L1C004R0550261504
  2 V300033420L1C005R0090362628
  3 V300033420L1C005R0090658320
  2 V300033420L1C005R0160918804
  2 V300033420L1C005R0330628576
  2 V300033420L1C005R0391023819
  2 V300033420L1C006R0021214012
  2 V300033420L1C006R0060892638
  2 V300033420L1C006R0060892639
  2 V300033420L1C006R0161090838

Most reads are encountered two times, that's because their mates are displayed with the same name. Only two reads map to a distant place:

V300033420L1C003R0371074885 339 chr21   6454741 0   82H68M  =   43062949    36608142    GATCCACCCCAGTGATCTGCAGAGGGCGCGGCTTCAGGGCTCAAGGCCAGCAAAAGCCCCGCCTGGAC    GFFFGFFCDFGGEFFFFFFGEFGGGF>FFGFFFFFEGFGG>FGFFFFFFFFFFFFFFDFFCGFCFFGF    SA:Z:chr21,43063049,-,81M69S,0,2;   XA:Z:chr21,-43063063,82S68M,1;  MC:Z:150M   MD:Z:11A56  NM:i:1  AS:i:63 XS:i:63
V300033420L1C001R0470415198 81  chr21   6454910 0   150M    =   43063063    36608005    GGATGAGAAACCCAAGAACCCAGGCCCCAAGTATTAAAGCGGTAATGACCTAAATGTTCAGTGCAGGGCGCTCAGAAGTAAAGGCAGGTGCCTGGGAGGAACTGACCTCCCCGGGGTGCTTCTCGGAGCGGGGCCCCCAGGACAGTCCAG  E=FEFFGFDCFDGFDGF.GFGFG0F'FGGFGEFFFGFBAFEG6FEFFCFFFF?GFFF3GFFFGEFGFFGDEDFGFGGGGGFGGGGFFFGFGFFAFFFFFGGGGGFFFFFFFGGGFFGGF;FGFFGFFGFFGFFFFFFEFF

So the question is: why mapping quality of all these reads is zero, when only two of them have another location where they are mapped?

Thanks!

bwa mapq mapping quality mq • 150 views
ADD COMMENTlink modified 6 weeks ago by Istvan Albert ♦♦ 84k • written 6 weeks ago by bataloffv20

Can you also post the parameters used with bwa for the original alignments?

ADD REPLYlink written 6 weeks ago by genomax89k

Bwa was piped after picard SamToFastq:

picard SamToFastq \
I=markilluminaadapters.bam \
FASTQ=/dev/stdout \
CLIPPING_ATTRIBUTE=XT CLIPPING_ACTION=2 INTERLEAVE=true NON_PF=true | \
bwa mem -M -t 4 -p /path/to/broad/hg38.fasta /dev/stdin > algn.sam
ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by bataloffv20

Look at the SAM flags (the second field from a SAM record), it could shed some light on this issue:

samtools view algn_srt.bam "chr21:43063074-43063074" | cut -f 2 | sort | uniq -c
ADD REPLYlink written 6 weeks ago by h.mon31k

The output:

  4 147
  1 161
 11 163
  1 339
  1 355
  9 83
  6 99
ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by bataloffv20
3
gravatar for Istvan Albert
6 weeks ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

Most tools will not list all locations for multiply mapped reads that match identically in multiple locations.

Usually, one location is picked randomly and will be the one that gets reported with a mapping quality of zero.

You would only get multiple alignments present if these were secondary or chimeric alignments.

Depending on what tool you use it may list the alternative alignments in the optional tags.

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Istvan Albert ♦♦ 84k

Thank you the answer. Looking at optional tags, I guess this is the indication of where the reads alternatively map?

XA:Z:chr21,-6454604,150M,0;
XA:Z:chr21,+6454616,150M,0;
XA:Z:chr21,+6454627,150M,4;
XA:Z:chr21,+6454627,150M,0;
XA:Z:chr21,+6454627,150M,1;
XA:Z:chr21,+6454627,150M,0;
XA:Z:chr21,-6454632,115M2D35M,2;
XA:Z:chr21,+6454637,150M,10;
XA:Z:chr21,+6454639,150M,1;
XA:Z:chr21,+6454647,150M,1;
XA:Z:chr21,-6454649,150M,3;
XA:Z:chr21,-6454664,150M,2;
XA:Z:chr21,-6454664,144M6S,2;
XA:Z:chr21,-6454674,150M,1;
XA:Z:chr21,+6454683,125M25S,2;
XA:Z:chr21,-6454692,150M,2;
XA:Z:chr21,+6454707,150M,2;
XA:Z:chr21,+6454713,150M,9;
XA:Z:chr21,+6454713,150M,1;
XA:Z:chr21,-6454719,150M,2;
SA:Z:chr21,43063063,+,87S63M,0,1;
XA:Z:chr21,+6454727,150M,1;
SA:Z:chr21,43063063,-,82S68M,0,2;
SA:Z:chr21,6454741,-,82S68M,0,1;
SA:Z:chr21,43063087,+,43M107S,0,2;
SA:Z:chr21,43063044,+,86M64S,0,2;
XA:Z:chr21,+6454741,29S121M,2;
XA:Z:chr21,-6454741,5S145M,4;
XA:Z:chr21,-6454741,15S135M,2;
SA:Z:chr21,43063049,-,81M69S,0,3;
XA:Z:chr21,-6454741,30S120M,2;
XA:Z:chr21,+6454742,150M,1;
XA:Z:chr21,+6454744,150M,3;
ADD REPLYlink written 6 weeks ago by bataloffv20
1

that looks like it indeed,

the XA tag indicates an aligner dependent tag, the bwa manual states:

XA  Alternative hits; format: (chr,pos,CIGAR,NM;)*

http://bio-bwa.sourceforge.net/bwa.shtml#4

the SA tag is specified in the SAM Specification as it is a standardized tag:

SA Z Other canonical alignments in a chimeric alignment

https://samtools.github.io/hts-specs/SAMtags.pdf

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Istvan Albert ♦♦ 84k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1133 users visited in the last hour