Question: Dbsnp File Needed For Bacteria
gravatar for Ashu
8.9 years ago by
Ashu10 wrote:

I need a dbsnp file in vcf format to run gatk's base quality recalibration for mycobacterium tuberculosis. How can I get it?

vcf gatk format dbsnp • 2.9k views
ADD COMMENTlink written 8.9 years ago by Ashu10
gravatar for Neilfws
8.9 years ago by
Sydney, Australia
Neilfws49k wrote:

So far as I understand the current state of SNP data:

  • the best public source is dbSNP at NCBI
  • Data for individual organisms is located in the organisms directory of the FTP site
  • for some organisms, VCF format is available for download
  • Mycobacterium tuberculosis is not one of those organisms
  • a dbSNP XML/ASN.1 -> VCF converter is a popular request but does not yet exist
ADD COMMENTlink written 8.9 years ago by Neilfws49k

Thank you so much neilfws

ADD REPLYlink written 8.9 years ago by Ashu10
gravatar for Swbarnes2
8.9 years ago by
Swbarnes21.5k wrote:

GATK variant calling isn't quite suited to bacterial projects, for just the reason you've's designed for human SNP calling, and it seems to insist that you do everything that a human SNP project needs, even of those steps aren't approprite to bacterial calling. I think the variant file is so that the software can filter out known variants present in the background population that might be in your sample, to help you focus on the novel SNPs. But for bacteria, there's no background variable population like that, at least, not one that's you can download off the internet and just apply.

For instance, I've done bacterial projects where parental and resistant offspring DNA samples were given to me, and I could see the presense of mixed SNPs in the parental strain, many of which turned up as homozygous in the offspring resistant strains. So in that case, GATK would have worked, because I could have given it the SNP file from the parental strain, and it would have made it easy to see which deviations from my downloaded reference the resistant strains possessed were inherited from their parent, and which must be new to the offspring, and therefore possibly granting resistance. But I would never have found such a file on dbSNP, it was specific to my parental.

And if dbSNP had a file that said that the F11 strain has the resistance-grantng KatG mutation at amino acid 315, that doesn't mean that I want it filtered away when I examine another strain for possible resistance-granting mutations.

So I guess, I'm not sure what the answer is. GATK doesn't seem to take no for an answer when it comes to providing that dbSNP list, so I'm inclined to think you are forced to use some other tool, like SAMTools.

ADD COMMENTlink written 8.9 years ago by Swbarnes21.5k

Thank you so much swbarnes :).

Base Quality Score Recalibration is a multistep process. Count Covariates needs the dbsnp file. But there is a parameter- - -run_without_dbsnp_potentially_ruining_quality - using this I can run countCovariates without the dbsnp file. But I was unsure if this is the right thing to do so wanted to run it with the file. I am a newbie in this feild so :)

ADD REPLYlink written 8.9 years ago by Ashu10

Also, I ran Unified Genotyper on the BAM files I have and generated a VCF file but the snp's I wanted to see in this file are missing.

I used the following command. Can you suggest some play around so that I might be able to see the expected result. java -jar GenomeAnalysisTK-1.0.5974GenomeAnalysisTK-1.0.5974GenomeAnalysisTK.jar -R dataReference_Bacteria.fasta -I data/aligned_reads.bam -T UnifiedGenotyper -mbq 0 -o data/mycalls.vcf

ADD REPLYlink written 8.9 years ago by Ashu10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1092 users visited in the last hour