Annovar doesnt output CADD scores
1
0
Entering edit mode
8 months ago
AMARU • 0

Hi,

I followed the Annovar tutorial with the default dataset (avsnp147, ExAC and dbnsfp30a). The tutorial can be found here: https://annovar.openbioinformatics.org/en/latest/user-guide/startup/

The resulting vcf contained all the expected format and data, including CADD scores. Then, I decided to repeat this using gnomad211_exome,avsnp150, and dbnsfp42c datasets instead of those above, but the resulting vcf file contains all the annotations expected except the CADD scores. These datasets were downloaded using the Annovar guidelines.

The header of the vcf doesn't even include the following:

##INFO=<ID=CADD_raw,Number=.,Type=Float,Description="CADD_raw annotation provided by ANNOVAR">
##INFO=<ID=CADD_phred,Number=.,Type=Float,Description="CADD_phred annotation provided by ANNOVAR">

Can someone tell me why is this happening? Do any of the datasets used in the second case not include CADD scores?

Below is the command I used:

perl ./annovar/table_annovar.pl \
  in.vcf \
  humandb/ \
  -buildver hg19 \
  -out myanno.Equal \
  -remove \
  -protocol refGene,cytoBand,gnomad211_exome,avsnp150,dbnsfp42c \
  -operation g,r,f,f,f \
  -nastring . \
  -vcfinput \
  -polish

Thanks in advance.

Annovar CADD • 860 views
ADD COMMENT
2
Entering edit mode
8 months ago
Ram 43k

Simple answer: You switched from dbNSFP academic version to commercial version, and the commercial version does not include CADD.

How to get to this answer:

You are missing CADD. CADD comes from dbNSFP. You used dbNSFP30a and it worked. Then you used dbNSFP42c and it did not. From previous experience, I know that the a and c suffixes are significant somehow - this is the only place where experience helps, but even if I did not know this, I'd look for differences between 30a and 42c and probably end up here a few minutes later than I did: http://database.liulab.science/dbNSFP#version

Two branches of dbNSFP are provided: dbNSFP4.4a suitable for academic use, which includes all the resources, and dbNSFP4.4c suitable for commercial use, which does not include Polyphen2, VEST, REVEL, ClinPred, CADD, LINSIGHT, and GenoCanyon.

All this is just to say that you did everything right - there is but one leap you needed to take to get to the solution yourself. Keep up this approach (of taking things that work and introducing small changes that might break them, then figure out how those small changes broke them) and you'll learn things super fast.

ADD COMMENT
0
Entering edit mode

Really great answer. I thought one of the databases didnt include the CADD score but didnt know which one was the problem.

Thanks a lot.

ADD REPLY
1
Entering edit mode

No problem. The idea that CADD could come from dbNSFP is also a thing from experience. Anyway, please accept my answer to provide closure to the question.

ADD REPLY

Login before adding your answer.

Traffic: 1795 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6