Question: GATK quality recalibration to get recalibrated BAM files before SNP calling
0
gravatar for alicia.poppy
11 months ago by
alicia.poppy40
alicia.poppy40 wrote:

I'm having problems with quality score recalibration using GATK as most tutorials/examples use an old version of GATK which has different syntax and arguments to the current version.

I have run this to get a recalibration table, which I think I need to get the recalibrated BAM files:

gatk-4.1.2.0/gatk BaseRecalibrator \
-reference human_genome/hg38_masked.fa \
--known-sites human_genome/All_20180418.vcf.gz \
--input S01_BAM_file.bam \
--output recal.table

This works fine, and I could run this for all of my samples.

But when I then do the next step to get the recalibrated BAM files I can't work out what to do.

*The old way of doing it was:

 java –jar GenomeAnalysisTK.jar –T PrintReads \ 
–R human.fasta \
–I realigned.bam \ 
–BQSR recal.table \ 
–o recal.bam*

I wrote:

gatk-4.1.2.0/gatk PrintReads \
-R data_R3/human_genome/hg38_masked.fa \
--input S01_BAM_file.bam \
-**BQSR** recal.table \
 --output quality_score_recalibrated_S01.bam

But this makes an error that says: "A USER ERROR has occurred: B is not a recognized option". And I can't find an alternative argument for -BQSR when I look at the PrintReads help manual.

Does anyone know the current way to run this quality score recalibration to get recalibrated BAM files?

snp recalibration bqsr gatk • 1.2k views
ADD COMMENTlink modified 11 months ago by Nicolas Rosewick8.8k • written 11 months ago by alicia.poppy40
1

Haven't used v4, but from GATK site

https://gatkforums.broadinstitute.org/gatk/discussion/comment/43986#Comment_43986

ADD REPLYlink written 11 months ago by Santosh Anand5.1k
3
gravatar for Nicolas Rosewick
11 months ago by
Belgium, Brussels
Nicolas Rosewick8.8k wrote:

you should use ApplyBQSR :

https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_bqsr_ApplyBQSR.php

gatk BaseRecalibrator \
   -I input.bam \
   -R reference.fasta \
   --known-sites sites_of_variation.vcf \
   --known-sites another/optional/setOfSitesToMask.vcf \
   -O recal_data.table

then

 gatk ApplyBQSR \
   -R reference.fasta \
   -I input.bam \
   --bqsr-recal-file recal_data.table \
   -O output.bam
ADD COMMENTlink modified 11 months ago • written 11 months ago by Nicolas Rosewick8.8k

Hi! @Nicolas Rosewick.Your comment was helpful. However, when I tried, your command was giving me the following error: A USER ERROR has occurred: '--bqsr_recal_file' is not a valid command. I tried the below-mentioned command and it worked. gatk ApplyBQSR \ -R reference.fasta \ -I input.bam \ --bqsr recal_data.table \ -O output.bam

So I would point using bqsr in small case rather than the syntax in which mentioned on the GATK website.

ADD REPLYlink written 9 months ago by rohitsatyam10290

Have you found a resolution? I am having the same issue doing it the way Nicholas suggested. Thanks

ADD REPLYlink written 6 months ago by someler0

Could you post your command and gatk version ?

ADD REPLYlink written 6 months ago by Nicolas Rosewick8.8k

**UPDATE:*

I'm not sure what I did differently honestly, but I now have output files, but they are very small so I know the script was not successful.

I now have a .bam, .bai and a .sbi file.

Here is my code

gatk BaseRecalibratorSpark \ --reference ref.genome.fa \ --input filename.bam \ --known-sites KnownSites.vcf \ --output recal_data.table

and then I follow up with:

gatk ApplyBQSRSpark \ --reference ref.genome.fa \ -I filename.bam \ --bqsr-recal-file recal_data.table \ -O output.bam \

I was getting this error:

A USER ERROR has occurred: Argument output was missing: Argument 'output' is required.


Using GATK jar /gpfs01/tools/gatk-4.1.2.0/gatk-package-4.1.2.0-local.jar line 51: --input: command not found line 55: --output: command not found

ADD REPLYlink modified 6 months ago • written 6 months ago by someler0

Do you have a reason to use the spark version ? Did you try the “standard” version of BQSR ?

ADD REPLYlink written 6 months ago by Nicolas Rosewick8.8k

Yes, I was using the standard version but it takes ~2 weeks to run exome files. (Is that customary for you?) I am trying to process ~300 more in the next 8 weeks. I am looking for a quicker alternative and I read abou Spark this morning. Will compare the files in the end.

ADD REPLYlink written 6 months ago by someler0

An exome file should run under a day for sure (in my case it takes 2-3 hours on my local cluster depending on the number of reads). Could you tell me more on the computational resource you have ?

ADD REPLYlink written 6 months ago by Nicolas Rosewick8.8k

Yes, for sure, it will definitely run in under a day. My files are averaging ~5-6 hours. I expect that because I use a node that is shared. If belongs to another PI; if their lab is using it, my job becomes paused. I am using a computing node with 28 cores, 126 GB of memory.

Consider that I have in total 500 files. I have process so far ~200, and I have another 300 left. Basically, it will take me 20-30 more days than I have left in my lab, and I have a hard deadline.

Additionally, if I could make this process easier for lab mates coming behind me, that would be a plus too!

Any insight on this would be extremely appreciated.

ADD REPLYlink written 6 months ago by someler0

you don’t have access to an other cluster (your institution maybe has a bigger one ? ). One node with 28 cores is clearly not a lot to process 500 exomes.. however BQSR use only one core per sample. So you could in theory process 28 samples in parallel thus dropping the compute time for 300 samples to ~65h. Of course you will use all cores of the node...

ADD REPLYlink modified 6 months ago • written 6 months ago by Nicolas Rosewick8.8k

I'd like to continue this conversation via email, if possible. Can I email you?

ADD REPLYlink written 6 months ago by someler0

@rohitsatyam102 It’s --bqsr-recal-file not --bqsr_recal_file

ADD REPLYlink modified 6 months ago • written 6 months ago by Nicolas Rosewick8.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1714 users visited in the last hour