Can any one guide me to about what standard GATK tools to use when making a vcf file from allignment sam file ? I'm kinda stuck. confused which one to use and when. ???
Where are you stuck? What confuses you? Please tell us how far you've gotten in your quest to solve this.
I have used PICARD for SortSam, and MarkDuplicates. Now which one to use next ?
GATK Best Practices says BaseRecalibration is next. Have you looked at that resource?
I am trying to follow this one. But BaseRecalibration requires input of -knownSites latest_dbsnp.vcf.
and i have whole genome of bacteria so what should i do with this parameter ? and what about next step ?
Ah, bacterial genome - I think you should have mentioned that in your original question, I'd have warned you against GATK - I think GATK might not be great for non-human genomes. Maybe try samtools/bcftools?
oh okz. I used samtools/bcftools pipeline. I made a vcf file too but i wanted to counter check that with GATK pipeline.
one more thing I have a genome of around mb, and samtools pipeline gave me a vcf file with around 2300 variants. Is it normal ? or did I made a mistake ?
If you were sequencing what was supposed to be the exact same strain of bacteria as your reference, that's kind of high. If it's not, that number might be appropriate. Eyeball some of the putative SNPs in IGV, see if you can verify visually that the reads show the SNPs in the vcf.
okz. that helped. Thanks alot