Rare Variant Snps With No Allele Frequency
2
1
Entering edit mode
11.1 years ago
NB ▴ 960

Hello,

I am working on analysis of human genomes to detect rare variants .

One of the filtering methods after removing the non coding variants is to check for allele frequency ( less than 5%) in the 1000 genomes database or 6500 exomes database.

70% of the variants from my list end up with no allele frequency from both the databases.

Can these SNPs be considered for further analysis or is there a way to deal with such SNPs ?

Thank you.

• 4.0k views
ADD COMMENT
3
Entering edit mode

Sounds like most in your list are false positives.

ADD REPLY
5
Entering edit mode
11.1 years ago
Laura ★ 1.8k

http://www.1000genomes.org/faq/why-do-some-variants-phase1-release-have-zero-allele-frequency

There are a small number of variants which have an Allele Count of 0 and an Allele Frequency of 0.

This is because the original sample list for phase1 had 1094 samples on it. After our integrated genotyping processes 2 samples where discovered to have very discordant genotypes.

NA07346 NA11918

The decision was made to leave in any variant which only been discovered in one or both of these individuals. The Analysis group is still confident in their sites but not in their genotypes. In doing this we are left with some variant sites where no sample holds the non reference allele.

ADD COMMENT
0
Entering edit mode

This is a very nice answer (I didn't know of these 2 individuals), but I don't think it can explain more than a few missing variants.

ADD REPLY
4
Entering edit mode
11.1 years ago

Hi Nandini,

If you're in the business of looking for rare variants, you should expect some number of them to have no attached allele frequency precisely because they're rare (and therefore they might not have been seen convincingly enough to appear in either database). If you're worried 70% seems a little too high, you can always check whether any of these have frequencies on dbSNP. Make sure you check whether the positions given in each database are 1-base or 0-base (i.e. first position is considered 0 or 1, dbSNP is 0-base whereas output from most callers is 1-base, if you're writing your own scripts, this could be important!). Also make sure those SNPs have been called convincingly. If you don't find anything wrong after that, you can consider them for further analysis.

ADD COMMENT

Login before adding your answer.

Traffic: 1850 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6