Question

How do variant callers get genotype info?

2

Entering edit mode

7.1 years ago

afzm ▴ 20

How are variant callers able to compute which copy of the chrs in case of a diploid species has the heterozygous detected variants? What information do they use, just paired end reads?

I could find some statistical data in GATK webpage, but I would like to understand if there is other information used, the rationale behind it the accuracy it would have and the factors that affect this process.

Thank you very much

variant calling SNP indel phasing • 1.5k views

ADD COMMENT • link updated 7.1 years ago by Jeremy Leipzig 23k • written 7.1 years ago by afzm ▴ 20

1

Entering edit mode

7.1 years ago

Pierre Lindenbaum 166k

Mathematical Notes on SAMtools Algorithms

https://software.broadinstitute.org/gatk/media/docs/Samtools.pdf

.... good luck...

ADD COMMENT • link 7.1 years ago by Pierre Lindenbaum 166k

score 3 · Accepted Answer · 2018-06-08

Usually the caller has no idea which chromosome homolog a variant is on. It can just see variants that are in the same read or read pair (unlikely for short reads) or it can try to infer which variants are on the same chromosome homolog (phased) using read-backed phasing (as part of the read assembly performed by the haplotype caller).

These in silico methods are spotty at best. Most people who need phasing just use a long-read technology, or they sequence the parents.

https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_phasing_ReadBackedPhasing.php

https://www.illumina.com/techniques/sequencing/dna-sequencing/whole-genome-sequencing/phased-sequencing.html