I need a dbsnp file in vcf format to run gatk's base quality recalibration for mycobacterium tuberculosis. How can I get it?
I need a dbsnp file in vcf format to run gatk's base quality recalibration for mycobacterium tuberculosis. How can I get it?
So far as I understand the current state of SNP data:
GATK variant calling isn't quite suited to bacterial projects, for just the reason you've found...it's designed for human SNP calling, and it seems to insist that you do everything that a human SNP project needs, even of those steps aren't approprite to bacterial calling. I think the variant file is so that the software can filter out known variants present in the background population that might be in your sample, to help you focus on the novel SNPs. But for bacteria, there's no background variable population like that, at least, not one that's you can download off the internet and just apply.
For instance, I've done bacterial projects where parental and resistant offspring DNA samples were given to me, and I could see the presense of mixed SNPs in the parental strain, many of which turned up as homozygous in the offspring resistant strains. So in that case, GATK would have worked, because I could have given it the SNP file from the parental strain, and it would have made it easy to see which deviations from my downloaded reference the resistant strains possessed were inherited from their parent, and which must be new to the offspring, and therefore possibly granting resistance. But I would never have found such a file on dbSNP, it was specific to my parental.
And if dbSNP had a file that said that the F11 strain has the resistance-grantng KatG mutation at amino acid 315, that doesn't mean that I want it filtered away when I examine another strain for possible resistance-granting mutations.
So I guess, I'm not sure what the answer is. GATK doesn't seem to take no for an answer when it comes to providing that dbSNP list, so I'm inclined to think you are forced to use some other tool, like SAMTools.
Thank you so much swbarnes :).
http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration
Base Quality Score Recalibration is a multistep process. Count Covariates needs the dbsnp file. But there is a parameter- http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_walkers_recalibration_CountCovariatesWalker.html - -run_without_dbsnp_potentially_ruining_quality - using this I can run countCovariates without the dbsnp file. But I was unsure if this is the right thing to do so wanted to run it with the file. I am a newbie in this feild so :)
Also, I ran Unified Genotyper on the BAM files I have and generated a VCF file but the snp's I wanted to see in this file are missing.
I used the following command. Can you suggest some play around so that I might be able to see the expected result. java -jar GenomeAnalysisTK-1.0.5974GenomeAnalysisTK-1.0.5974GenomeAnalysisTK.jar -R dataReference_Bacteria.fasta -I data/aligned_reads.bam -T UnifiedGenotyper -mbq 0 -o data/mycalls.vcf
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you so much neilfws