How to find variants that are common in one population but rare in others (population differentiation)?
3
2
Entering edit mode
4.9 years ago
SOHAIL ▴ 340

Hi Everyone,

I have two VCF files: 1) VCF file containing SNPs from my population 2) Another VCF file with 1000 Genome SNPs data set from 26 populations.

I want to extract those variants that are found to be rare (<0.5%) within global samples (1000G) but common within my population (>5%). Can anyone please suggest a way how to do that?

Thank you!

SNP wgs variant filtration population genomics • 2.3k views
ADD COMMENT
1
Entering edit mode
4.9 years ago
donfreed ★ 1.6k

You can do this pretty easily using the GATK assuming that your VCF has the AF info field annotation. First annotate the variants in your VCF with the allele frequency of the variants in 1000 Genomes.

java -jar $GATK -R reference.fasta -T VariantAnnotator -V input.vcf -o output_1.vcf --resource:onekg 1000genomes.vcf --expression onekg.AF

Then look for sites with AF > 0.05 and onekg.AF < 0.005 using GATK's SelectVariants.

java  -jar GenomeAnalysisTK.jar -R reference.fasta -T SelectVariants --variant output_1.vcf -o output_2.vcf -select "AF > 0.05 && onekg.AF < 0.005"
ADD COMMENT
0
Entering edit mode

Hi @donfreed!,

I tired VariantAnnotator but it prompts me an error message... it's odd:

ERROR -- ERROR stack trace

java.lang.NullPointerException at org.broadinstitute.gatk.tools.walkers.annotator.VariantAnnotator.initialize(VariantAnnotator.java:284) at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83) at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:316) at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:123) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158) at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:108) ERROR ------------------------------------------------------------------------------------------ ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67): ERROR ERROR This might be a bug. Please check the documentation guide to see if this is a known problem. ERROR If not, please post the error message, with stack trace, to the GATK forum. ERROR Visit our website and forum for extensive documentation and answers to ERROR commonly asked questions https://software.broadinstitute.org/gatk ERROR ERROR MESSAGE: Code exception (see stack trace for error itself) ERROR ------------------------------------------------------------------------------------------

ADD REPLY
1
Entering edit mode

Hi @donfreed!

Thanks for your code... problem solved... previously i had an issue with GATK..... now it's fine..

ADD REPLY
1
Entering edit mode
4.9 years ago
Ram 35k

You should be able to use plink/vcftools to filter the 1000g VCF and get a BED file. Use that BED file to filter your VCF file while simultaneously adding a frequency filter.

ADD COMMENT

Login before adding your answer.

Traffic: 2113 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6