I am very new to this kind of work so I would appreciate some opinions on the early stages of a SNP/Indel discovery workflow.
- Formulate list of genes of interest.
- Design custom capture for all exons from these genes.
- Sequence (Illumina, Paired End, 100bp reads, 50x coverage minimum)
- Remove duplicate paired end reads.
- Align to genome (Bfast to allow gaps and find indels as well as SNPs)
- Use SAMTools mpileup with varfilt to remove SNPs with a quality score less than 20 or indels with a score less than 50.
- Remove SNPs with low coverage (less than 30x?)
- Proceed to association type study (details to be worried about later - my major concern is the upstream, NGS stuff at the moment as I have never done it before).
All comments welcome. Thanks in advance!