Variant Filtration By Exclusion Of Common Or Well-Known Variants
11.4 years ago
tommivat ▴ 250

I'm doing analysis of variant calling pipeline (VCP) results of human exome in order to achieve easier data inspection and better data representation for, e.g., personalized medicine. At the moment, a typical number of SNPs provided by the VCP is >100k. I would like to filtrate the results according to known variation in human population.

The filtration described above is conducted in the most studies which, e.g., try to predict drug sensity according to the NGS results. However, the tools used (whether there are any) are not given. In their excellent article The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity, Barretina et al. give following details in the supplementary material.

Variant filtration by exclusion of common germline variants: Variants for which the global allele frequency (GAF) in dbSNP134 or allele frequency in the NHLBI Exome Sequencing Project (, data release ESP2500) was higher than 0.1% were excluded from further analysis.

Is the NHLBI Exome Sequencing Project batch query tool the best tool to do this kind of filtration? Can you also comment the threshold frequency (0.1%) which was used in the study?

Variant filtration by exclusion of variants observed in a panel of normals: Variants detected in a panel of 278 whole exomes sequenced at the Broad as part of the 1000 Genomes Project were excluded from further analysis. Beyond removal of additional germline variation, this step also allowed elimination of common false positives that originate predominantly from alignment artifacts.

Here, any particular tool is not given. Are there any freely available tools for conducting such filtration?

More advanced topic: Implementing this kind of filtration on earlier stages of VCP is an interesting idea already discussed among computer scientist (see. e.g. Are there any existing VCP able to do this kind of calling against several reference genomes? What do you think of this idea in general?

sequencing snp software human filtering
11.4 years ago

I wrote two posts about filtering some Snps using EVS:

11.4 years ago

To my knowledge filtering against the EVS dataset is the most effective way at present to identify common variants in exome sequencing data from human subjects. If a variant is present in experimental exome data, but not present in EVS, it usually warrants further inspection (assuming coverage, quality is acceptable as well). We used to filter only by dbSNP, under the assumption that dbSNP excludes pathogenic variants -- but that is not always the case. There can be rare pathogenic alleles present in the heterozygous state in the healthy population that make it into dbSNP.

Probably one of the reasons no particular tools are mentioned in the publications you cite is that these groups, like many groups working in this field, probably queried the reference datasets with in-house scripts of various sorts.

10.1 years ago
alexej.knaus ▴ 130

Using the frequency filter in GeneTalk ( users can filter out commen variants by frequency based on the data from the 1000 Genomes Project and the 6500 Exoms data.

Register a free account at GeneTalk and upload your VCF file that will be automatically annotated and preprocessed. After filtering you can take a look into existing annotation (from dbSNP, HGMD, etc.) or create your own, that would help the community of users to interpret medically relecant variants. You can comment and rank annotations (by medical relevance and scientifi evidence) and thus provide your expertise to the community.

The platfom is beeing developed at the Institute for Medical Genetics and Human Genetics at the Charité in Berlin. -->


