Question: pre-processing whole genome data
0
gravatar for br.tania
2.6 years ago by
br.tania40
br.tania40 wrote:

Hi everyone!

I am new to whole genome analyses. For guidance, I am referring to 'Best Practices for Germline SNP & Indel Discovery in Whole Genome and Exome Sequence'.

As of now, I need to call variants in Rhesus Macaque paired-end reads (fasta files).

I used the most recent reference genome available to map them using BWA. Then, duplicates were marked using Picard. The next step is supposed to be: Recalibrate Base Quality Scores. According to this link (https://software.broadinstitute.org/gatk/documentation/article?id=2801 ), it consists of four sub-steps. The command for the first sub-step is suggested to be the following:

java -jar GenomeAnalysisTK.jar \ -T BaseRecalibrator \ -R reference.fa \ -I input_reads.bam \ -L 20 \ -knownSites dbsnp.vcf \ -knownSites gold_indels.vcf \ -o recal_data.table

My question is about the -knownSites options here.

Is a vcf file listing the known sites available for all organisms? At the NCBI website, I do see that the information (several known SNPs) is there for Macaca mulatta but I am unable to figure out how to obtain it in a vcf format as such.

I would appreciate any sort of enlightening inputs.

Thanks in advance!

rhesus macaque gatk dbsnp.vcf • 943 views
ADD COMMENTlink modified 2.6 years ago by genomax74k • written 2.6 years ago by br.tania40

Take a look at this GATK thread for additional information.

ADD REPLYlink written 2.6 years ago by genomax74k

Thanks! I will give a feedback once I try out the suggestions. I actually later also stumbled upon the ncbi repertoire of dbSNPs for macaques.

ADD REPLYlink written 2.5 years ago by br.tania40

The BBMap package has a faster and easier and option for recalibration, which does not need known sites... Usage:

calctruequality.sh in=mapped.bam ref=reference.fa ploidy=2 callvariants
bbduk.sh in=mapped.bam out=recalibrated.bam recalibrate
ADD REPLYlink written 2.6 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1666 users visited in the last hour