Issue with ExAC and 1000g hg38 lifted-over data and systematic failure of annotation softwares!
1
0
Entering edit mode
6.1 years ago
reza.jabal ▴ 570

ExAC data with hg38 coordinates has been around for filter-based annotation since late 2015, but it seems there is a systematic problem with the use of ExAC and 1000G lifted-over data data for annotation! Mainstream annotation softwares (Annovar, VEP and snpEff) fail to incorporate MAF for variants that their corresponding contig is reversed in the hg38 assembly. As a result, common variants in ExAC and 1000G populations might be misinterpreted as novel variant solely because annotation softwares fail to report corresponding MAF.

I was wondering if anyone here has come across the same problem and if so, how they have tackled this problem?

annotation Exac 1000g hg38 • 2.4k views
0
Entering edit mode
5.9 years ago
Pablo ★ 1.9k

In my opinion, this sounds like a problem in the lift-over procedure, as opposed to a failure in the annotation software.

In order to correctly lift-over variants (e.g. a VCF file from ExAC), not only the coordinates should be changed, but also the variant's REF and ALT fields must be complemented accordingly in reversed genes (I'm talking about WC-complement). If this last part is not done right, downstream annotation software would fail to annotate just because the input is incorrect.

Again, this is an opinion / guess (since there was no sample data in the post, I cannot dig deeper).

0
Entering edit mode

Hi Pablo, Thanks for your comment! Yes you guessed it right. Deeper investigation of the matter led me to realisation that the problem rather lays in dbsnp liftedover data. I am now using a custom script to fill in missing frequencies.