Question: Why the optimal value of BQSR intervals is 20
0
gravatar for bluemonster0808
19 months ago by
China
bluemonster080840 wrote:

I see a hint in SevenBridges "BQSR intervals optimal value is 20 or chr20 *".

I tried running GATK BaseRecalibrator with and without -L 20, the result files are nearly the same. Why? We don't need to do BQSR in chromosomes other than chr20?

gatk wgs • 923 views
ADD COMMENTlink modified 17 months ago by dariober10.0k • written 19 months ago by bluemonster080840

Here is my command with -L 20

java -Xmx50000M -jar GenomeAnalysisTK-3.5-0-g36282e4/GenomeAnalysisTK.jar --analysis_type BaseRecalibrator -nct 48 --out CCLE-HCC1143-DNA-10_Illumina.converted.sorted.deduped.recal_L20data.grp --disable_indel_quals --reference_sequence human_g1k_v37_decoy.fasta --input_file CCLE-HCC1143-DNA-10_Illumina.converted.sorted.deduped.bam --knownSites dbsnp_137.b37.vcf --knownSites 1000G_phase1.indels.b37.vcf --knownSites Mills_and_1000G_gold_standard.indels.b37.sites.vcf -L 20

ADD REPLYlink modified 19 months ago • written 19 months ago by bluemonster080840
1

Do yourself a favor and omit the BQSR step. There were some papers out on the last years, as well as comments here on Biostars stating that BQSR has little to no effect on variant calling. Just use the search function here to get some details. EDIT: The more I searched around, I also find others stating that BSQR is beneficial, so I have to relativize my above comment.

ADD REPLYlink modified 17 months ago • written 19 months ago by ATpoint15k

Thank you for your reply. As a newbie, I find the GATK Best Practices recommends people to do BQSR in https://software.broadinstitute.org/gatk/best-practices/bp_3step.php?case=GermShortWGS

ADD REPLYlink written 19 months ago by bluemonster080840

Hi bluemonster0808,

I see a hint in SevenBridges "BQSR intervals optimal value is 20 or chr20 *".

Do you have a source for this quotation? A quick scour of the World Wide Web using Google reveals to me that the quote as you've put it doesn't exist (?), but maybe you heard it from a colleague or SevenBridges white paper?

ADD REPLYlink written 17 months ago by Kevin Blighe41k

https://ibb.co/cT0if6

you can see the screenshot above

I save it on a free image host

ADD REPLYlink modified 17 months ago • written 17 months ago by bluemonster080840
1
gravatar for dariober
17 months ago by
dariober10.0k
WCIP | Glasgow | UK
dariober10.0k wrote:

Since this question has returned to the top, I'll give it a shot answering...

Base quality recalibration as implemented in the GATK interface consists of two steps. The first step collects statistics about biases, the second step actually edits the bam records to recalibrate them.

The first step (GenomeAnalysisTK.jar -T BaseRecalibrator ...) doesn't require the entire genome. In fact, one chromosome may suffice to collect enough data to have accurate statistics (see also Downsampling to reduce time) hence the suggestion to use -L chr20. In turn, this explains why without -L, i.e. using the entire genome, you get effectively the same results.

ADD COMMENTlink written 17 months ago by dariober10.0k

This sounds reasonable

Thank you!

ADD REPLYlink written 17 months ago by bluemonster080840
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour