Question

Difference between FATHMM SCORE & FATHMM_MKL_CODING_SCORE

0

Entering edit mode

4.8 years ago

vaish01kv • 0

I have 4 columns in my annotated VCF "FATHMM_SCORE, FATHMM_PRED, FATHMM_MKL_CODING_SCORE, FATHMM_MKL_CODING_PRED". Can someone please explain what is the difference between these two scores?

ANNOVAR OUTPUT • 2.2k views

ADD COMMENT • link updated 4.8 years ago by Kevin Blighe 87k • written 4.8 years ago by vaish01kv • 0

score 0 · Answer 1 · 2019-07-01

FATHMM and FATHMM-MKL are in silico functional prediction tools that were developed by a group at the University of Bristol in England.

FATHMM

FATHMM came first and was tailoured for coding variants - it has 3 sub-algorithms that were built on training datasets of:

Inherited disease variants
Cancer mutations
Disease-specific variants

When using FATHMM, one should technically choose which sub-algorithm to use.

FATHMM-MKL

FATHMM-MKL came later and is tailoured for non-coding variants. It was built on the following data:

46-Way Sequence Conservation: based on multiple sequence alignment scores, at the nucleotide level, of 46 vertebrate genomes compared with the human genome.
Histone Modifications (ChIP-Seq): based on ChIP-Seq peak calls for histone modifications.
Transcription Factor Binding Sites (TFBS PeakSeq): based on PeakSeq peak calls for various transcription factors.
Open Chromatin (DNase-Seq): based on DNase-Seq peak calls.
100-Way Sequence Conservation: based on multiple sequence alignment scores, at the nucleotide level, of 100 vertebrate genomes compared with the human genome.
GC Content: based on a single measure for GC content calculated using a span of five nucleotide bases from the UCSC Genome Browser.
Open Chromatin (FAIRE): based on formaldehyde-assisted isolation of regulatory elements (FAIRE) peak calls.
Transcription Factor Binding Sites (TFBS SPP): based on SPP peak calls for various transcription factors.
Genome Segmentation: based on genome-segmentation states using a consensus merge of segmentations produced by the ChromHMM and Segway software.
Footprints: based on annotations describing DNA footprints across cell types from ENCODE.

[source: https://academic.oup.com/bioinformatics/article/31/10/1536/177080#84558293]

As with all algorithms developed at the time, it was observed that conservation is the single best predictor of pathogenicity.

I list other in silico prediction tools, here: A: pathogenicity predictors of cancer mutations

Kevin