9 days ago by
Washington University, St Louis, USA
It is definitely possible to assess paternity from whole genome sequence (WGS) data. Paternity can probably be established with as little as a few dozen or maybe hundreds of well-chosen single nucleotide polymorphisms (SNPs). If you have decent WGS data you can expect to genotype millions of SNPs. So, paternity assessment would be very confident from such data. What data do you have available? Assuming that you have raw sequence data (e.g., fastq or unaligned bam files) you will first need to align to an appropriate reference genome.
There are several online tutorials to give you the general idea:
Note. Both the above tutorials are a little out of date. Current best practice would be to use bwa mem (available with current bwa installations). See http://bio-bwa.sourceforge.net/bwa.shtml
Once you have aligned your data you will probably want to mark duplicate reads and perform base quality score recalibration (BQSR). For some sample commands taking you through bwa mem alignment, duplicate marking, and BQSR see here: http://pmbio.org/module%202/0002/01/31/Alignment/
Next, you will want to run GATK variant caller. For a trio analysis I suggest you try running GATK HaplotypeCaller in GVCF mode and then performing joint genotyping. See here for a tutorial on this topic:
This is all explained in great detail in the excellent GATK Best Practices for Variant Discovery workshops organized by the Broad. See
Finally, assuming you get through the above. You should have a VCF with genotype calls for millions of SNPs for your trio. You then need to look at SNP genotype concordance between individuals in your trio to estimate kinship. This is itself a complicated area of research that I am not very familiar with. But, paternity should be one of the simpler relationships to prove. I believe the KING tool is popular for this and could take the above VCF as a starting point.