Missing SNP in gnomAD hg38 liftover vcfs
1
1
Entering edit mode
4.6 years ago

Hi,

On gnomAD website there are both hg19 and hg38 vcf. hg38 vcfs are liftover from hg19. https://gnomad.broadinstitute.org/downloads

I was analyzing some data using both hg19 and hg38 gnomAD vcf and I found strange stuff. For instance SNP rs11354897 : https://gnomad.broadinstitute.org/variant/7-72209527-CA-C is missing ing hg38 vcf

In hg19 :

bcftools view -H https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.vcf.bgz 7:72209527

results in :

7   72209527    rs11354897  CA  C   4.31187e+06 PASS    AC=6487;AN=31348;AF=0.206935 ...

So perfect, the SNP is there.

Now in hg38 :

Looking at ensembl website for rs11354897 , position on hg38 is 7:72744552 http://www.ensembl.org/Homo_sapiens/Variation/Explore?db=core;r=7:72744052-72745052;v=rs11354897;vdb=variation;vf=416257549

bcftools view -H https://storage.googleapis.com/gnomad-public/release/2.1.1/liftover_grch38/vcf/genomes/gnomad.genomes.r2.1.1.sites.7.liftover_grch38.vcf.bgz chr7:72744550-7274455

gives me no results.

Any explanation for this ?

Should I report it to gnomAD team ?

--

EDIT 13/09/2019 :

Checking other SNPs in gnomAD I found one other example :

in hg19 : chr17-41961451-T-C https://gnomad.broadinstitute.org/variant/17-41961451-T-C

The reported SNP is in dbSNP : https://www.ncbi.nlm.nih.gov/snp/rs231518 and has a hg38 position : chr17:43884083

Looking in official gnomad hg38 VCF : no results !

Looking in ENSEMBL gnomad hg38 vcf ( from here : ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/gnomad/r2.1/genomes/ )

bcftools view ftp://ftp.ensembl.org/pub/data_files/homo_sapiens/GRCh38/variation_genotype/gnomad/r2.1/genomes/gnomad.genomes.r2.1.sites.grch38.chr17_noVEP.vcf.gz 17:43884083-43884083

result :

17  43884083    rs231518    C   T   1.77035e+07 PASS    AC=27429;AN=31374;AF=0.874259 ...

I guess I will use VEP gnomad hg38 vcf for now. But it's strange that the official one from gnomAD missed this SNP..

Thanks

Edit 17/10/2019 :

As gnomAD v3.0 is now out. They re-analyse WGS on hg38 (not a "simple" lift-over). I can now see the SNP of interest :

bcftools view -H https://storage.googleapis.com/gnomad-public/release/3.0/vcf/genomes/gnomad.genomes.r3.0.sites.chr17.vcf.bgz chr17:43884083-43884083

chr17   43884083    rs231518    C   T   1.67035e+07 PASS    AC=16885;AN=143172;AF=0.117935;variant_type=snv;n_alt_al ...

Problem solved. Thanks gnomAD ;)

gnomad SNP • 2.2k views
ADD COMMENT
2
Entering edit mode
4.6 years ago

running liftover for this rs using chr7:72209527-72209528 returns a failure:

#Partially deleted in new (  Sequence insufficiently intersects one chain)
chr7:72209527-72209528
ADD COMMENT
1
Entering edit mode

Thanks Pierre. However I've an other example with this SNP where liftover exists :

   CHROM       POS REF   ALT    GT      AD
   chr17  43884083   C     T   0/1   16,25

looking in gnomAD hg38 (liftover from hg19 gnomAD vcf) no results.

After lifting over the position to hg19 I found chr17:41961451 . Looking on gnomAD website this variant 17-41961451-T-C pops at this position : https://gnomad.broadinstitute.org/variant/17-41961451-T-C

In fact genome sequence between hg19 and hg38 are different. In hg19 the ref is T ; in hg38 the ref is C. In this case reaf and alt are switched between hg19 and hg38. Now I would like to know if there is a way to annotate my hg38 variant of interest 17-43884083-C-T based on this . In the current example the gnomAD AF should be 0.8893. One idea would be for all heterozygote SNP to test both ref-alt and alt-ref (e.g. C-T and T-C) against gnomAD.

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6