Question: How can I best use -c in bwa mem for alignment and to find polymorphisms?
0
gravatar for kirannbishwa01
18 months ago by
United States
kirannbishwa01500 wrote:

I am doing bwa mem alignment and want to use the -c in a best way. As, I understand -c INT is meant to discard a MEM reads if it has more than INT occurences in a genome. The default -c value is 10000, so a read may be potentially aligned to at maximum of 10000 places in the genome. But, reporting of an alignment is also controlled by -T [default value 30], i.e. no alignment will be reported that has value below 30 (1st Question - so is this value of 30 referring to mapQ??).

I want to do a variant calling on my population samples using genome re-seq data that has coverage above 10x (10 to 70). The population should not vary more than 1%.

I previously did bwa mem alignment using default parameter, and I choose uniquely mapped reads  using -f 2 flag in samtools, and/or used mapQ values to select for the reads that show good mapping quality (I think mapQ > 39, is good, since anything that maps to more places will have its mapQ value exponentially reduced).

 

So, if I chose to do -c 1, any reads that maps uniquely won't be affected in the report. But, the reads that potentially can map to multiple places will be selected for the best MEM, rite?? since -c is set to 1.

But will this multiple mapping reads have the high mapQ values sinces its alignment to other places in the genome is restricted?

Additionally, what will happen if the reads can map equally to two different places with equal mismatches and gaps?

Finally, is the setting -c 1 as good way of identifying polymorhisms? - I am kind of in a grey area here and would appreciate any suggestions - I understand it all depends upon if the reads has repeatitive bases in it and/or the amount of mismatches/gaps it may have compared to one region of the genome vs. the other one, But, my guess if I set -c 1 and restrict the mutiple mapping reads to only one alignment the chances that the alignment and identified polymorphisms is right is high - since the reference genomes are generated using highly inbred lines which are mostly homozygous. My guess is setting -c 1 (or possibly 2) will help use capture valuable polymorphisms, but I also plan on using mapQ to supplement the selection.

 

Thanks,

 

 

sequencing snp mapq bwa alignment • 1.1k views
ADD COMMENTlink modified 18 months ago • written 18 months ago by kirannbishwa01500
1
gravatar for Istvan Albert
18 months ago by
Istvan Albert ♦♦ 73k
University Park, USA
Istvan Albert ♦♦ 73k wrote:

BWA mem will produce SAM flags for both secondary and supplementary alignments. The mapping quality will also be set to zero when the read is multimapping.

In general genomic variation should be discovered from all alignments. Trying to tweak the aligner is not a good strategy. For every alignment you know its mapping quality, you know if it is primary or secondary what good could come from willfully ignoring this information?

ADD COMMENTlink written 18 months ago by Istvan Albert ♦♦ 73k
1

To add to Stefan's comment. I think most SNP callers will not take a multi-mapping read seriously and will give it poor quality. So there is no need to tweak the alignment, rather you can play with the quality of called SNPs.

ADD REPLYlink written 18 months ago by apelin20460
0
gravatar for kirannbishwa01
18 months ago by
United States
kirannbishwa01500 wrote:

Thanks for the information. I was thinking that tweaking would help but after much reading and reviewing your comments it looks like it may not be helpful.

Thanks much :)

ADD COMMENTlink written 18 months ago by kirannbishwa01500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1447 users visited in the last hour