I have been researching fair bit over the last few days about SNPs/InDels calling, My project pretty much exactly summarised here http://seqanswers.com/forums/showthread.php?t=57494&highlight=samtools+mpileup. My NGS data has also been generated by cripsr/cas9 method.
I used this method
HaplotypeCaller -R genome.fa -I sorted.bam -stand_call_conf 30 -stand_emit_conf 10 -o sorted-out.vcf -L chrX --downsampling_type NONE
and it sort of worked. I do get some InDels picked up. I want to call the same InDels with samtools mpileup. Two reasons:
1. I feel I'm not understanding how InDels/SNP being called in general
2. The actual project is a bit more involved. we are interested in both SNP and InDels some of which will only be found in fewer reads.
This is what I did to try replicate HyplotypeCaller witih samtools mpileup
samtools mpileup -g -f genome.fa sorted.bam > sorted-mpileup.bcf bcftools call -c -V snps sorted-mpileup.bcf | less
I get no InDels found however I do know that there is an InDel from both IGV and GATK HaplotypeCaller. And I also generate pile file and looked into it and it shows me that I do have and InDel...so what is going on...?
Also can anyone reiterate here why mpileup has max per-file depth set to 8000 and how does this effects my SNP/InDel calling given that I have ~ 400,000 reads in my bam files?
Also also when I do
samtool mpileup -g -f genome.fa sorted.bam > sorted-mpileup..bcf
I get DP tag included as a default and given that I have more than 8000 reads in my bam file DP tag for all SNP is 7999. But the question is what is the different when I include -t DP,DPR option during my mpileup..? because I end up with two DP tags in my vcf file with slightly different values..?
(this is really a testRun data and it is an amplicon data however in future I'll be calling SNP/InDel from genome wide NGS data)
Note please assume that I'm complete noob in variant calling and provide as much explanation as you can. much appreciate your time in advance
I understand this is really a few different questions in one, but I believe this will be a good recourse for others to refer to given that SNP/InDels calling is common topic and I couldn't find any good resources that would give a good guide for SNP/InDel calling hence me trying to include as much description about variant calling as I can and hope that others would to through comments and answers.