Scatter / Gather for BaseRecalibrator on a single human WES dataset?
0
0
Entering edit mode
4.2 years ago
asg • 0

Hello everyone! Please excuse me if this question is a bit naïve: I'm new to bioinformatics in general and GATK in particular.

I am using the GATK4 suite to ultimately call germline variants on whole exome sequencing data obtained from an Illumina NextSeq 550 sequencer. (For a variety of reasons I cannot use the WDL/Cromwell setup recommended by the Best Practices, so I am trying to replicate the recommended workflow as a series of Bash scripts.)

I would like to speed up the BQSR step by employing the Scatter / Gather strategy. However, studying this article (https://gatk.broadinstitute.org/hc/en-us/articles/360035890531-Base-Quality-Score-Recalibration-BQSR-), I've realized that BaseRecalibrator requires a lot of data to build a proper statistical model.

My question: is it okay to scatter the BaseCalibrator job by chromosome if I analyze just one WES sample at a time? (I know that downstream I will need to perform joint genotyping with 30+ samples, but at the moment I'm preparing single-sample BAM files one-by-one.)

The article above says specifically that BaseRecalibrator expects each read group to have at least 100M bases. Calculated naively, PF_HQ_ALIGNED_BASES / 23 = 215+ megabases (the metric is taken from the CollectAlignmentSummaryMetrics output).

Thank you!

— Alex.

P.S. This is a repost of my question from the GATK forum. I apologize if this is generally frowned upon, but since this is not a technical issue with the tool itself, the team could not offer any guidance as of yet.

next-gen gatk bqsr exome human • 1.2k views
ADD COMMENT
0
Entering edit mode

Could you post your BQSR command please ? How many samples do you have ? BQSR should not take too much time in my experience..

ADD REPLY

Login before adding your answer.

Traffic: 1524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6