Question: Rare Variant Snps With No Allele Frequency
gravatar for Nandini
6.1 years ago by
Nandini780 wrote:


I am working on analysis of human genomes to detect rare variants .

One of the filtering methods after removing the non coding variants is to check for allele frequency ( less than 5%) in the 1000 genomes database or 6500 exomes database.

70% of the variants from my list end up with no allele frequency from both the databases.

Can these SNPs be considered for further analysis or is there a way to deal with such SNPs ?

Thank you.

ADD COMMENTlink modified 6.1 years ago by Matt Miossec320 • written 6.1 years ago by Nandini780

Sounds like most in your list are false positives.

ADD REPLYlink written 6.1 years ago by lh331k
gravatar for Laura
6.1 years ago by
Cambridge UK
Laura1.7k wrote:

There are a small number of variants which have an Allele Count of 0 and an Allele Frequency of 0.

This is because the original sample list for phase1 had 1094 samples on it. After our integrated genotyping processes 2 samples where discovered to have very discordant genotypes.

NA07346 NA11918

The decision was made to leave in any variant which only been discovered in one or both of these individuals. The Analysis group is still confident in their sites but not in their genotypes. In doing this we are left with some variant sites where no sample holds the non reference allele.

ADD COMMENTlink written 6.1 years ago by Laura1.7k

This is a very nice answer (I didn't know of these 2 individuals), but I don't think it can explain more than a few missing variants.

ADD REPLYlink written 6.1 years ago by Giovanni M Dall'Olio26k
gravatar for Matt Miossec
6.1 years ago by
Matt Miossec320
Universidad Andrés Bello
Matt Miossec320 wrote:

Hi Nandini,

If you're in the business of looking for rare variants, you should expect some number of them to have no attached allele frequency precisely because they're rare (and therefore they might not have been seen convincingly enough to appear in either database). If you're worried 70% seems a little too high, you can always check whether any of these have frequencies on dbSNP. Make sure you check whether the positions given in each database are 1-base or 0-base (i.e. first position is considered 0 or 1, dbSNP is 0-base whereas output from most callers is 1-base, if you're writing your own scripts, this could be important!). Also make sure those SNPs have been called convincingly. If you don't find anything wrong after that, you can consider them for further analysis.

ADD COMMENTlink written 6.1 years ago by Matt Miossec320
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1634 users visited in the last hour