I have been trying to find the exact alelle frequency for the indel GTGTGTGTGTGTGT/- at this location: chr7:121680790-121680803 using UCSC genome browser and got this (https://genome.ucsc.edu/cgi-bin/hgc?hgsid=480764461_B3MhdXBZJZy6eIECQ9RMLMDiN2jF&c=chr7&o=121680783&t=121680793&g=snp138&i=rs72117862). I can see rs ID from this link which I then used to look into dbSNP, but there as well, I could not find the MAF (minor allele frequency) for this variant. Can someone please suggest me what I can do to find the allele frequency for this variant(or if it is rare variant)?
hi, that SNP (rs72117862) is present in the All SNPs (138) track in UCSC Table Browser (hg19). Here -
#bin chrom chromStart chromEnd name score strand refNCBI refUCSC observed molType class valid avHet avHetSE func locType weight exceptions submitterCount submitters alleleFreqCount alleles alleleNs alleleFreqs bitfields
1513 chr7 121680783 121680793 rs72117862 0 + GTGTGTGTGT GTGTGTGTGT -/GTGTGTGTGT genomic deletion unknown 0 0 intron range 1 1 DEVINE_LAB, 0
I dont think the entry has allele freq. associated with it as per the UCSC entry. May be contact the submitter lab?
May be because there is no MAF to this INDEL! I have found this paper An initial map of insertion and deletion (INDEL) variation in the human genome, which proposes an assay to estimate MAF of some known INDELs. This should explain why it is not cheap to compute INDELs MAF , and why some INDELs have no MAF (like yours).
Hth
Thanks you. So do you think this SNP is of any significance and that it could be disease carrying variant?
No! My answer is strict to the MAF.
Testing for association with a disease is a different story. As a consequence, let me ask you this, are all SNPs and INDELs of known MAF unassociated with any disease? If so, can you explain what the GWAS catalog is all about?
MAF tells you how frequent is the non.ref allele in the population (more or less). Moreover, non.ref allele does not mean it is the recessive one or the dominant, which adds another layer. Whether it was a disease associated is something YOU have to answer based on your data.