Question: Best method to call SNPs/genotype many bam files from only a defined list of variants/SNPs?
gravatar for mmats010
4.6 years ago by
mmats01080 wrote:

Let's say I generated a list of 25,000 SNPs across the genome of my particular organism (which is a diploid non-model with no publically available variant resources). So, given this list of SNPs (in an appropriate format, VCF?) is there a standard method available to call genotypes from input bam files for ONLY the 25,000 variants on this list? My ideal output would be a file with 25,000 genotype calls for each individual input bam file, with the actual genotype calls being along the lines of "AG" or "GG", for example.

Essentially, is there a way to do the same thing you would do for a genotyping microarray, but applying it to illumina reads?

I'm vaguely aware that samtools mpileup can do something like this, but I'm looking for a simpler method to actually call the genotypes that doesn't involve manually calculating read ratios per site (although that data could still be useful). Could you force Haplotypecaller to do something like this?

As for my actual application: I am illumina sequencing ~70 members of a mapping population with the intent of building a linkage map to validate and scaffold a reference genome assembly.

Thanks, Mike

ADD COMMENTlink modified 4.3 years ago • written 4.6 years ago by mmats01080

I think that GATK Haplotypecaller using flag --dbsnp your_25000_SNPs.vcf should answer your question.

ADD REPLYlink written 4.6 years ago by Nicola Casiraghi450
gravatar for igor
4.6 years ago by
United States
igor11k wrote:

GATK HaplotypeCaller has an --alleles option:

The set of alleles at which to genotype when --genotyping_mode is GENOTYPE_GIVEN_ALLELES When the UnifiedGenotyper is put into GENOTYPE_GIVEN_ALLELES mode it will genotype the samples using only the alleles provide in this rod binding

ADD COMMENTlink written 4.6 years ago by igor11k
gravatar for mmats010
4.3 years ago by
mmats01080 wrote:

Hi @igor and @Nicola Casiraghi, Unfortunately, both of your approaches seem to have failed for me.

After carefully filtering a file containing off of my SNPs (with a PASS during filtering) I am interested in, I came up with a my_25000_SNPs.vcf file. However, if i pass this VCF file into the --dbSNP argument, while in "--genotyping_mode GENOTYPE_GIVEN_ALLELES" mode, the output file gives me exactly zero results. Annoyingly, the process still takes as long as one might expect HaplotypeCaller to process ~25000 sites but doesn't give me an error message. The output file is my VCF file but with zero variant rows.

I will add that I have tried variations of both of your solutions, as well as adding -L my_25000_SNPs.vcf while in both "-DISCOVERY" mode and "GENOTYPE_GIVEN_ALLELES " mode for both a given --dbsnp file.vcf or -L file.vcf

should I give a file.table (two columns only, output of VariantsToTable) in the -L option?

Seems more complicated than it should be, Mike

ADD COMMENTlink written 4.3 years ago by mmats01080
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1160 users visited in the last hour