Difference between FATHMM SCORE & FATHMM_MKL_CODING_SCORE
1
0
Entering edit mode
2.3 years ago
vaish01kv • 0

I have 4 columns in my annotated VCF "FATHMM_SCORE, FATHMM_PRED, FATHMM_MKL_CODING_SCORE, FATHMM_MKL_CODING_PRED". Can someone please explain what is the difference between these two scores?

ANNOVAR OUTPUT • 1.1k views
ADD COMMENT
0
Entering edit mode
2.3 years ago

FATHMM and FATHMM-MKL are in silico functional prediction tools that were developed by a group at the University of Bristol in England.

FATHMM

FATHMM came first and was tailoured for coding variants - it has 3 sub-algorithms that were built on training datasets of:

  1. Inherited disease variants
  2. Cancer mutations
  3. Disease-specific variants

When using FATHMM, one should technically choose which sub-algorithm to use.

FATHMM-MKL

FATHMM-MKL came later and is tailoured for non-coding variants. It was built on the following data:

  • 46-Way Sequence Conservation: based on multiple sequence alignment scores, at the nucleotide level, of 46 vertebrate genomes compared with the human genome.
  • Histone Modifications (ChIP-Seq): based on ChIP-Seq peak calls for histone modifications.
  • Transcription Factor Binding Sites (TFBS PeakSeq): based on PeakSeq peak calls for various transcription factors.
  • Open Chromatin (DNase-Seq): based on DNase-Seq peak calls.
  • 100-Way Sequence Conservation: based on multiple sequence alignment scores, at the nucleotide level, of 100 vertebrate genomes compared with the human genome.
  • GC Content: based on a single measure for GC content calculated using a span of five nucleotide bases from the UCSC Genome Browser.
  • Open Chromatin (FAIRE): based on formaldehyde-assisted isolation of regulatory elements (FAIRE) peak calls.
  • Transcription Factor Binding Sites (TFBS SPP): based on SPP peak calls for various transcription factors.
  • Genome Segmentation: based on genome-segmentation states using a consensus merge of segmentations produced by the ChromHMM and Segway software.
  • Footprints: based on annotations describing DNA footprints across cell types from ENCODE.

[source: https://academic.oup.com/bioinformatics/article/31/10/1536/177080#84558293]

As with all algorithms developed at the time, it was observed that conservation is the single best predictor of pathogenicity.

I list other in silico prediction tools, here: A: pathogenicity predictors of cancer mutations

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6