8.4 years ago by
Rochester, NY USA
Your workflow might look something like this:
Generate VCF files of your trios with SNPs and indels with GATK, and then annotate with annovar or seattleseq. Also, use GATK to calculate your depth of coverage for your target.
Start by filtering out SNPs that are in dbSNP -- these are not likely to be pathogenic variants (but could be rare disease alleles, so be careful, you may need to go back and reanalyze...)
If you have scripting skills in something nice like Perl or Python, write a couple of scripts to pull out nonsynonymous (nonsense, missense) variants that obey your hypotheses (you mention de novo/sporadic). This gives you a shortened list of potential disease-causing variants.
Using your depth-of-coverage data should let you weed out further variants in areas of low coverage that may be crap. Then again, be careful, they might not be, and you may need to go back and reanalyze...
Annotate your shorter list of variants through the Exome Variant Server, to kick out the variants seen there that are likely to be not-so-rare alleles that do not cause disease.
Mix well, and repeat steps as needed. Remember, you may need to alter key parameters at each step and reanalyze... If you are unlucky, you may need to pull in gene ontology data or data about gene function in other organisms to help you rank variants...
Finally, any variants you identify need to be validated with Sanger... and then the fun begins. You need to validate further by sequencing in larger cohorts or do some functional wet-lab experiments to generate biologically relevant data.