Question

pre-processing whole genome data

0

Entering edit mode

7.0 years ago

br.tania ▴ 50

Hi everyone!

I am new to whole genome analyses. For guidance, I am referring to 'Best Practices for Germline SNP & Indel Discovery in Whole Genome and Exome Sequence'.

As of now, I need to call variants in Rhesus Macaque paired-end reads (fasta files).

I used the most recent reference genome available to map them using BWA. Then, duplicates were marked using Picard. The next step is supposed to be: Recalibrate Base Quality Scores. According to this link (https://software.broadinstitute.org/gatk/documentation/article?id=2801 ), it consists of four sub-steps. The command for the first sub-step is suggested to be the following:

java -jar GenomeAnalysisTK.jar \ -T BaseRecalibrator \ -R reference.fa \ -I input_reads.bam \ -L 20 \ -knownSites dbsnp.vcf \ -knownSites gold_indels.vcf \ -o recal_data.table

My question is about the -knownSites options here.

Is a vcf file listing the known sites available for all organisms? At the NCBI website, I do see that the information (several known SNPs) is there for Macaca mulatta but I am unable to figure out how to obtain it in a vcf format as such.

I would appreciate any sort of enlightening inputs.

Thanks in advance!

dbSNP.vcf gatk rhesus macaque • 1.8k views

ADD COMMENT • link updated 7.0 years ago by GenoMax 141k • written 7.0 years ago by br.tania ▴ 50

0

Entering edit mode

Take a look at this GATK thread for additional information.

ADD REPLY • link 7.0 years ago by GenoMax 141k

0

Entering edit mode

Thanks! I will give a feedback once I try out the suggestions. I actually later also stumbled upon the ncbi repertoire of dbSNPs for macaques.

ADD REPLY • link 6.9 years ago by br.tania ▴ 50

0

Entering edit mode

The BBMap package has a faster and easier and option for recalibration, which does not need known sites... Usage:

calctruequality.sh in=mapped.bam ref=reference.fa ploidy=2 callvariants
bbduk.sh in=mapped.bam out=recalibrated.bam recalibrate

ADD REPLY • link 7.0 years ago by Brian Bushnell 20k