Hi,
I'm currently working with eQTL data from sc-eQTLGen and trying to annotate with ref/alt alleles and allele frequency (GRCH37) using R. The data contains an AlleleAssessed
column, which I assume contains the reference allele from dbSNP.
I know the R packages SNPlocs.Hsapiens.dbSNP144.GRCh37
and MafDb.gnomAD.r2.1.hs37d5
can give me access to the minor allele frequency from gnomAD using the SNP rs ID or chromosome + position, but the only information I get on the ref and alt alleles is the allele_as_ambig
column returned by the gscores
function, which gives alleles as IUPAC codes.
How can I rapidly get ref/alt alleles and allele frequency for ~1M SNPs in R? I considered downloading the gnomAD database and querying it with tabix, but the database contains way too much information, causing the tabix query to be very slow.