Question: How to find variants that are common in one population but rare in others (population differentiation)?
gravatar for SOHAIL
3.9 years ago by
Beijing Institute of Genomics, CAS.
SOHAIL320 wrote:

Hi Everyone,

I have two VCF files: 1) VCF file containing SNPs from my population 2) Another VCF file with 1000 Genome SNPs data set from 26 populations.

I want to extract those variants that are found to be rare (<0.5%) within global samples (1000G) but common within my population (>5%). Can anyone please suggest a way how to do that?

Thank you!

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by SOHAIL320
gravatar for donfreed
3.9 years ago by
San Francisco
donfreed1.5k wrote:

You can do this pretty easily using the GATK assuming that your VCF has the AF info field annotation. First annotate the variants in your VCF with the allele frequency of the variants in 1000 Genomes.

java -jar $GATK -R reference.fasta -T VariantAnnotator -V input.vcf -o output_1.vcf --resource:onekg 1000genomes.vcf --expression onekg.AF

Then look for sites with AF > 0.05 and onekg.AF < 0.005 using GATK's SelectVariants.

java  -jar GenomeAnalysisTK.jar -R reference.fasta -T SelectVariants --variant output_1.vcf -o output_2.vcf -select "AF > 0.05 && onekg.AF < 0.005"
ADD COMMENTlink written 3.9 years ago by donfreed1.5k

Hi @donfreed!,

I tired VariantAnnotator but it prompts me an error message... it's odd:

ERROR -- ERROR stack trace

java.lang.NullPointerException at at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute( at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute( at org.broadinstitute.gatk.engine.CommandLineExecutable.execute( at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start( at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start( at org.broadinstitute.gatk.engine.CommandLineGATK.main( ERROR ------------------------------------------------------------------------------------------ ERROR A GATK RUNTIME ERROR has occurred (version 3.7-0-gcfedb67): ERROR ERROR This might be a bug. Please check the documentation guide to see if this is a known problem. ERROR If not, please post the error message, with stack trace, to the GATK forum. ERROR Visit our website and forum for extensive documentation and answers to ERROR commonly asked questions ERROR ERROR MESSAGE: Code exception (see stack trace for error itself) ERROR ------------------------------------------------------------------------------------------

ADD REPLYlink written 3.9 years ago by SOHAIL320

Hi @donfreed!

Thanks for your code... problem solved... previously i had an issue with GATK..... now it's fine..

ADD REPLYlink written 3.9 years ago by SOHAIL320
gravatar for _r_am
3.9 years ago by
Baylor College of Medicine, Houston, TX
_r_am31k wrote:

You should be able to use plink/vcftools to filter the 1000g VCF and get a BED file. Use that BED file to filter your VCF file while simultaneously adding a frequency filter.

ADD COMMENTlink written 3.9 years ago by _r_am31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1340 users visited in the last hour