We'd like to perform trio analysis of mice to detect de novo mutations (via WGS and variant calling). The complication: our mouse facility performs breeding in bulk (~15 each males and females). Mom-pup pairs are obviously straightforward. To identify dads, our strategy is to sequence all of the males and infer paternal parentage by comparing dad-pup SNPs. One method is to manually identify SNPs at heterozygous loci for discrimination, but I assume there are tools that can perform this analysis from bulk VCFs. Any suggestions?
I recently came across Peddy that I'll be trying out soon. VCF tools also has a couple of implementations in
--relatedness2, which are based on Yang et al, and Manichaikul et al respectively. There's also KING which I think was used in ExAC.
There are several options: I list you a couple that just came to my mind.
The most rigorous one would be to use Mendel. Method 9 is pedigree selection. Basically you specify all the possible pedigrees and it tells you which one is the most likely. Cons of this approach: 1) I am not sure it is easy to specify you data structure (one trio plus 14 unrelated males) to the software. You might have to do several comparisons between trios where mom-pup are the same and dad is changing. 2) I do not think mendel takes vcf as input.
Another approach would be to use the number of Mendelian inconsistencies to find the most probable father, using for example the mendelian plugin (I never tried it. For sure GATK has something similar). Basically, you run it on all the possible trios, i.e. for each mum-pup couple you rotate all the possible dads. If the dads are not related to the mom, then one of them (the real one) should show a sensibly lower number of mendelian inconsistencies.
Be advised that either analysis you perform, you should perform it on a very reliable set of SNPs.