Question

How to find variants that are common in one population but rare in others (population differentiation)?

3

Entering edit mode

8.5 years ago

SOHAIL ▴ 410

Hi Everyone,

I have two VCF files: 1) VCF file containing SNPs from my population 2) Another VCF file with 1000 Genome SNPs data set from 26 populations.

I want to extract those variants that are found to be rare (<0.5%) within global samples (1000G) but common within my population (>5%). Can anyone please suggest a way how to do that?

Thank you!

SNP wgs variant filtration population genomics • 3.9k views

ADD COMMENT • link 8.5 years ago by SOHAIL ▴ 410

1

Entering edit mode

8.5 years ago

Ram 45k

You should be able to use plink/vcftools to filter the 1000g VCF and get a BED file. Use that BED file to filter your VCF file while simultaneously adding a frequency filter.

ADD COMMENT • link 8.5 years ago by Ram 45k

score 1 · Accepted Answer · 2016-12-28

1

Entering edit mode

8.5 years ago

donfreed ★ 1.6k

You can do this pretty easily using the GATK assuming that your VCF has the AF info field annotation. First annotate the variants in your VCF with the allele frequency of the variants in 1000 Genomes.

java -jar $GATK -R reference.fasta -T VariantAnnotator -V input.vcf -o output_1.vcf --resource:onekg 1000genomes.vcf --expression onekg.AF

Then look for sites with AF > 0.05 and onekg.AF < 0.005 using GATK's SelectVariants.

java  -jar GenomeAnalysisTK.jar -R reference.fasta -T SelectVariants --variant output_1.vcf -o output_2.vcf -select "AF > 0.05 && onekg.AF < 0.005"

ADD COMMENT • link 8.5 years ago by donfreed ★ 1.6k

0

Entering edit mode

Hi @donfreed!,

I tired VariantAnnotator but it prompts me an error message... it's odd:

ERROR -- ERROR stack trace

java.lang.NullPointerException at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.initialize(VariantAnnotator.java:284) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108) ERROR ------------------------------------------------------------------------------------------ ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67): ERROR ERROR This might be a bug. Please check the documentation guide to see if this is a known problem. ERROR If not, please post the error message, with stack trace, to the GATK forum. ERROR Visit our website and forum for extensive documentation and answers to ERROR commonly asked questions https://software.broadinstitute.org/gatk ERROR ERROR MESSAGE: Code exception (see stack trace for error itself) ERROR ------------------------------------------------------------------------------------------

ADD REPLY • link 8.5 years ago by SOHAIL ▴ 410

1

Entering edit mode

Hi @donfreed!

Thanks for your code... problem solved... previously i had an issue with GATK..... now it's fine..

ADD REPLY • link 8.5 years ago by SOHAIL ▴ 410