Question: Why does inclusion of decoy sequences cause more BWA alignments to an autosome?
1
gravatar for pamela.russell.ucdenver
4 months ago by

I am seeing strange behavior when comparing alignments generated by bwa mem 0.7.12 to two different reference genomes: hg19 and hs37d5 (basically hg19 plus additional decoy sequences). We have DNA-seq data. I noticed that when using hg19, there are very few alignments to the MHC region on chromosome 6, and even fewer of these have nonzero mapping quality. When using hs37d5, there are dramatically more alignments to the region and these have mostly high mapping quality scores. I have not observed this phenomenon anywhere else I've looked in the genome. The behavior is robust to multiple different choices of BWA parameters. Can anyone explain why the inclusion of the additional 35Mb of decoy sequences in hs37d5 would drastically improve the number and quality of alignments to this region of chr6?

bwa hs37d5 • 251 views
ADD COMMENTlink modified 3 months ago • written 4 months ago by pamela.russell.ucdenver20

I don;t know why the decoy is doing this but I just wonder is there something about chromosome 6?

I'd like to ask have you loaded the region in IGV? Are they all mapping to a very small region? Recently, I also found a very strange behaviour of reads from both ATAC-seq and ChiP-seq data mapping to a small area of chr6. The reads were also highly enriched for a very long motifs (20+ bases). I suspect these regions were missed by repeat masking because they only occured within short regions that bridged between very large repeat masked regions.

ADD REPLYlink written 4 months ago by YaGalbi1.3k

Thanks for this idea. I've looked at the alignments in IGV. They do map to several punctate peaks, leaving most of the region uncovered. We would expect coverage of the entire region to be fairly even. I'm guessing we are seeing alignments to regions that are easier to sequence, but still don't know why these alignments disappear when using hg19.

ADD REPLYlink written 3 months ago by pamela.russell.ucdenver20
2
gravatar for pamela.russell.ucdenver
3 months ago by

I figured this out. It's because I was using the UCSC version of hg19 that includes some additional assemblies of MHC region haplotypes. Reads were preferentially mapping to those extra contigs instead of chr6.

ADD COMMENTlink written 3 months ago by pamela.russell.ucdenver20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 984 users visited in the last hour