I'm trying to annotate 400000 variants in a vcf file with dbNSFP3.5a using the hg19 human version. In order to use the hg19/GRCh37, I handled the dbNSFP3.5a.zip with the following commands:
unzip dbNSFPv3.5a.zip head -n1 dbNSFP3.5a_variant.chr1 > h cat dbNSFP3.5a_variant.chr* | grep -v ^#chr | awk '$8 != "."' | sort -k8,8 -k9,9n - | cat h - | bgzip -c > dbNSFP_hg19.gz tabix -s 8 -b 9 -e 9 dbNSFP_hg19.gz
Neverthless, I only got 0.36% annotated with this database (using both dbSNP151 and dbSNP150), which is very lower compared to the annotation with the dbNSFP2.9.txt.gz database (15.96%)(using also both dbSNP151 and dbSNP150).
Can anybody help me to figure out this problem?