Question: What to do to identify rare variatns from vcf files resulted from UnifiedGenotyper?
gravatar for thejustpark
10 months ago by
United States
thejustpark60 wrote:


I am new to exome sequencing data anslysis and want to ask questions regarding what to do in my situation. I have spent quite a time to figure this out by myself, but since there's no one around me to direct me, I couldn't get much. I am given vcf files suspected to be from GATK UnifiedGenotyper on case and control samples (A1 and A2 are our case and B1 is our control), namely case[or control].indel.raw.vcf, case[or control].snp.raw.vcf, case[or control].var.raw.vcf. Now, I need to identify 1) rare variants ((SNPs or indels) with frequency less than 0.01% in EXaC or GenomAD) present only in the two cases and not in the control. 2) PolyPhen/SIFT or other scores for the identified rare variants.

My questions are 1) GATK manual says that, since UnifiedGenotyper would produce many false positives, these files need to go through a lot of filtering processes. However, I can't find in the manual what kind of filters I need to apply using what kind of tools. 2) Can you please give me the pipeline to identify rare variants and PolyPhen/SIFT scores from the vcf files?

Thank you very much for your time.

next-gen • 268 views
ADD COMMENTlink modified 10 months ago by rse70 • written 10 months ago by thejustpark60
gravatar for finswimmer
10 months ago by
finswimmer9.9k wrote:

Hello thejustpark,

about your question concerning filter false positive variants I recommend reading this blog post first. After that you can try your first steps with this tutorial. But keep in mind that there is no gold standard for doing hard filtering, as it depends on so many things.

The easiest way for you to get population frequencies is to use ensembl's VEP. For missense variants it also report Polyphen/SIFT scores. There are other ways to do this. The term you are looking for is "Variant Annotation". Tools that can do this are for example SnpEff and bcftools.

fin swimmer

ADD COMMENTlink written 10 months ago by finswimmer9.9k
gravatar for rse
10 months ago by
rse70 wrote:

Hi, you can annotate against population db's and then filter

ADD COMMENTlink written 10 months ago by rse70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2461 users visited in the last hour