Dbsnp File Needed For Bacteria
2
1
Entering edit mode
12.3 years ago
Ashu ▴ 10

I need a dbsnp file in vcf format to run gatk's base quality recalibration for mycobacterium tuberculosis. How can I get it?

dbsnp gatk vcf format • 3.8k views
ADD COMMENT
3
Entering edit mode
12.3 years ago
Neilfws 49k

So far as I understand the current state of SNP data:

  • the best public source is dbSNP at NCBI
  • Data for individual organisms is located in the organisms directory of the FTP site
  • for some organisms, VCF format is available for download
  • Mycobacterium tuberculosis is not one of those organisms
  • a dbSNP XML/ASN.1 -> VCF converter is a popular request but does not yet exist
ADD COMMENT
0
Entering edit mode

Thank you so much neilfws

ADD REPLY
0
Entering edit mode
12.3 years ago
Swbarnes2 ★ 1.6k

GATK variant calling isn't quite suited to bacterial projects, for just the reason you've found...it's designed for human SNP calling, and it seems to insist that you do everything that a human SNP project needs, even of those steps aren't approprite to bacterial calling. I think the variant file is so that the software can filter out known variants present in the background population that might be in your sample, to help you focus on the novel SNPs. But for bacteria, there's no background variable population like that, at least, not one that's you can download off the internet and just apply.

For instance, I've done bacterial projects where parental and resistant offspring DNA samples were given to me, and I could see the presense of mixed SNPs in the parental strain, many of which turned up as homozygous in the offspring resistant strains. So in that case, GATK would have worked, because I could have given it the SNP file from the parental strain, and it would have made it easy to see which deviations from my downloaded reference the resistant strains possessed were inherited from their parent, and which must be new to the offspring, and therefore possibly granting resistance. But I would never have found such a file on dbSNP, it was specific to my parental.

And if dbSNP had a file that said that the F11 strain has the resistance-grantng KatG mutation at amino acid 315, that doesn't mean that I want it filtered away when I examine another strain for possible resistance-granting mutations.

So I guess, I'm not sure what the answer is. GATK doesn't seem to take no for an answer when it comes to providing that dbSNP list, so I'm inclined to think you are forced to use some other tool, like SAMTools.

ADD COMMENT
0
Entering edit mode

Thank you so much swbarnes :).

http://www.broadinstitute.org/gsa/wiki/index.php/Base_quality_score_recalibration

Base Quality Score Recalibration is a multistep process. Count Covariates needs the dbsnp file. But there is a parameter- http://www.broadinstitute.org/gsa/gatkdocs/release/org_broadinstitute_sting_gatk_walkers_recalibration_CountCovariatesWalker.html - -run_without_dbsnp_potentially_ruining_quality - using this I can run countCovariates without the dbsnp file. But I was unsure if this is the right thing to do so wanted to run it with the file. I am a newbie in this feild so :)

ADD REPLY
0
Entering edit mode

Also, I ran Unified Genotyper on the BAM files I have and generated a VCF file but the snp's I wanted to see in this file are missing.

I used the following command. Can you suggest some play around so that I might be able to see the expected result. java -jar GenomeAnalysisTK-1.0.5974GenomeAnalysisTK-1.0.5974GenomeAnalysisTK.jar -R dataReference_Bacteria.fasta -I data/aligned_reads.bam -T UnifiedGenotyper -mbq 0 -o data/mycalls.vcf

ADD REPLY

Login before adding your answer.

Traffic: 1805 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6