I am looking for a good workflows, readings or tutorial for SNP calling. I read some other posts on this topic, but I would like a more detailed explanation. Population genomics and sequence data are new to me (I have a general CS and biology background). It might just be me, but these tools are not as straightforward or as documented as I'd like. Any links or explanations would be good!
So far, my situation is as follows:
As you can tell, I am totally new with this. It is pretty exciting so I want to learn and be able to do some of these things! Thanks in advance.
edit: I also get so confused with some of the output, more detailed documentation on that would be nice as well!
BWA using defaults it's probably OK.
bwa samse hg37.fasta s.sai s.fastq > s.sam
samtools view -S -b s.sam > s.bam
Sort BAM file (will create s_sort.bam)
samtools sort s.bam s_sort
Call variants: I.e. Create VCF file (BcfTools is part of samtools distribution)
samtools mpileup -uf hg37.fasta s_sort.bam | bcftools view -vcg - > s.vcf
There is a lot more (like local realignment, etc.). But if this is your first time doing it, you should start with the basics.
We are working on a SNP pipeline now. You might find my work in progress pipeline useful.
The pipeline currently starts with an alignment from BWA. It uses GATK for SNP calling.
Briefly, the flow involves:
We are still fleshing out the details on filtering and such, but it might be a good starting point to for executing GATK in a working order
I strongly recommend this recent article from authors of GATK.
It covers various aspect associated with SNP calling in detail. At the same time do refer the software manual/wiki for up-to-date options incorporated in the toolkit.
have a look at this course material from UT-Austin https://wikis.utexas.edu/display/bioiteam/SSC+Intro+to+NGS+Bioinformatics+Course
I have put together a tutorial website with four core tutorials on it, RNA-Seq, ChIP-Seq, Genome assembly, and SNP calling that may be of use to you.
This website was created to share bioinformatics tutorials and create a dynamic learning environment that does not become dated, PDF contributions welcome and there are four core tutorials available. We would be interested to get some feedback.