I have a set of eight Illumina-sequenced trios for an organism, for which a reference genome is available. I'd now like to call genotypes on these data taking advantage of both a "classic" multisample and trio-aware approaches. I understand that bcftools (part of samtools suite) can do trio-aware calling, but I'm not sure it can work with multiple trios. At the same time, GATK (and samtools too?) can run a Bayesian model based on observed allele frequencies assuming a flat population structure.
So my question is - are there any tools that can combine both approaches?
Thanks a lot!
You can always hard filter by taking the union of the parents intersected with the child. The child must contain only SNVs found in the parents.
Thanks - I currently do something along these lines, but I was wondering if a more general way of taking this info into account when computing SNP quality scores was available...
mpileup in samtools can leverage calling multiple individuals. I believe this is beneficial when there is evidence of a SNP, but low coverage. http://samtools.sourceforge.net/mpileup.shtml