Hi,
On gnomAD website there are both hg19 and hg38 vcf. hg38 vcfs are liftover from hg19. https://gnomad.broadinstitute.org/downloads
I was analyzing some data using both hg19 and hg38 gnomAD vcf and I found strange stuff. For instance SNP rs11354897
: https://gnomad.broadinstitute.org/variant/7-72209527-CA-C is missing ing hg38 vcf
In hg19 :
bcftools view -H https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.vcf.bgz 7:72209527
results in :
7 72209527 rs11354897 CA C 4.31187e+06 PASS AC=6487;AN=31348;AF=0.206935 ...
So perfect, the SNP is there.
Now in hg38 :
Looking at ensembl website for rs11354897
, position on hg38 is 7:72744552
http://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=7:72744052-72745052;v=rs11354897;vdb=variation;vf=416257549
bcftools view -H https://storage.googleapis.com/gnomad-public/release/2.1.1/liftover_grch38/vcf/genomes/gnomad.genomes.r2.1.1.sites.7.liftover_grch38.vcf.bgz chr7:72744550-7274455
gives me no results.
Any explanation for this ?
Should I report it to gnomAD team ?
--
EDIT 13/09/2019 :
Checking other SNPs in gnomAD I found one other example :
in hg19 : chr17-41961451-T-C https://gnomad.broadinstitute.org/variant/17-41961451-T-C
The reported SNP is in dbSNP : https://www.ncbi.nlm.nih.gov/snp/rs231518 and has a hg38 position : chr17:43884083
Looking in official gnomad hg38 VCF : no results !
Looking in ENSEMBL gnomad hg38 vcf ( from here : ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/gnomad/r2.1/genomes/ )
bcftools view ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/gnomad/r2.1/genomes/gnomad.genomes.r2.1.sites.grch38.chr17_noVEP.vcf.gz 17:43884083-43884083
result :
17 43884083 rs231518 C T 1.77035e+07 PASS AC=27429;AN=31374;AF=0.874259 ...
I guess I will use VEP gnomad hg38 vcf for now. But it's strange that the official one from gnomAD missed this SNP..
Thanks
Edit 17/10/2019 :
As gnomAD v3.0 is now out. They re-analyse WGS on hg38 (not a "simple" lift-over). I can now see the SNP of interest :
bcftools view -H https://storage.googleapis.com/gnomad-public/release/3.0/vcf/genomes/gnomad.genomes.r3.0.sites.chr17.vcf.bgz chr17:43884083-43884083
chr17 43884083 rs231518 C T 1.67035e+07 PASS AC=16885;AN=143172;AF=0.117935;variant_type=snv;n_alt_al ...
Problem solved. Thanks gnomAD ;)
Thanks Pierre. However I've an other example with this SNP where liftover exists :
looking in gnomAD hg38 (liftover from hg19 gnomAD vcf) no results.
After lifting over the position to hg19 I found
chr17:41961451
. Looking on gnomAD website this variant17-41961451-T-C
pops at this position : https://gnomad.broadinstitute.org/variant/17-41961451-T-CIn fact genome sequence between hg19 and hg38 are different. In hg19 the ref is T ; in hg38 the ref is C. In this case reaf and alt are switched between hg19 and hg38. Now I would like to know if there is a way to annotate my hg38 variant of interest 17-43884083-C-T based on this . In the current example the gnomAD AF should be 0.8893. One idea would be for all heterozygote SNP to test both ref-alt and alt-ref (e.g. C-T and T-C) against gnomAD.