Missing SNP in gnomAD hg38 liftover vcfs
2.1 years ago


On gnomAD website there are both hg19 and hg38 vcf. hg38 vcfs are liftover from hg19. https://gnomad.broadinstitute.org/downloads

I was analyzing some data using both hg19 and hg38 gnomAD vcf and I found strange stuff. For instance SNP rs11354897 : https://gnomad.broadinstitute.org/variant/7-72209527-CA-C is missing ing hg38 vcf

In hg19 :

bcftools view -H https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.vcf.bgz 7:72209527

results in :

7   72209527    rs11354897  CA  C   4.31187e+06 PASS    AC=6487;AN=31348;AF=0.206935 ...

So perfect, the SNP is there.

Now in hg38 :

Looking at ensembl website for rs11354897 , position on hg38 is 7:72744552 http://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=7:72744052-72745052;v=rs11354897;vdb=variation;vf=416257549

bcftools view -H https://storage.googleapis.com/gnomad-public/release/2.1.1/liftover_grch38/vcf/genomes/gnomad.genomes.r2.1.1.sites.7.liftover_grch38.vcf.bgz chr7:72744550-7274455

gives me no results.

Any explanation for this ?

Should I report it to gnomAD team ?


EDIT 13/09/2019 :

Checking other SNPs in gnomAD I found one other example :

in hg19 : chr17-41961451-T-C https://gnomad.broadinstitute.org/variant/17-41961451-T-C

The reported SNP is in dbSNP : https://www.ncbi.nlm.nih.gov/snp/rs231518 and has a hg38 position : chr17:43884083

Looking in official gnomad hg38 VCF : no results !

Looking in ENSEMBL gnomad hg38 vcf ( from here : ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/gnomad/r2.1/genomes/ )

bcftools view ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/gnomad/r2.1/genomes/gnomad.genomes.r2.1.sites.grch38.chr17_noVEP.vcf.gz 17:43884083-43884083

result :

17  43884083    rs231518    C   T   1.77035e+07 PASS    AC=27429;AN=31374;AF=0.874259 ...

I guess I will use VEP gnomad hg38 vcf for now. But it's strange that the official one from gnomAD missed this SNP..


Edit 17/10/2019 :

As gnomAD v3.0 is now out. They re-analyse WGS on hg38 (not a "simple" lift-over). I can now see the SNP of interest :

bcftools view -H https://storage.googleapis.com/gnomad-public/release/3.0/vcf/genomes/gnomad.genomes.r3.0.sites.chr17.vcf.bgz chr17:43884083-43884083

chr17   43884083    rs231518    C   T   1.67035e+07 PASS    AC=16885;AN=143172;AF=0.117935;variant_type=snv;n_alt_al ...

Problem solved. Thanks gnomAD ;)

gnomad SNP • 1.4k views
2.1 years ago

running liftover for this rs using chr7:72209527-72209528 returns a failure:

#Partially deleted in new (  Sequence insufficiently intersects one chain)
Thanks Pierre. However I've an other example with this SNP where liftover exists :

   CHROM       POS REF   ALT    GT      AD
   chr17  43884083   C     T   0/1   16,25

looking in gnomAD hg38 (liftover from hg19 gnomAD vcf) no results.

After lifting over the position to hg19 I found chr17:41961451 . Looking on gnomAD website this variant 17-41961451-T-C pops at this position : https://gnomad.broadinstitute.org/variant/17-41961451-T-C

In fact genome sequence between hg19 and hg38 are different. In hg19 the ref is T ; in hg38 the ref is C. In this case reaf and alt are switched between hg19 and hg38. Now I would like to know if there is a way to annotate my hg38 variant of interest 17-43884083-C-T based on this . In the current example the gnomAD AF should be 0.8893. One idea would be for all heterozygote SNP to test both ref-alt and alt-ref (e.g. C-T and T-C) against gnomAD.


