Question: Reliable Tools To Filter Vcf Format Files
1
gravatar for Tonyzeng
7.3 years ago by
Tonyzeng300
Tonyzeng300 wrote:

I have VCF variants files, can anyone provide me a list of tools for variant filtering? thank you

vcf tools variant • 5.5k views
ADD COMMENTlink modified 7.3 years ago by dangenet90 • written 7.3 years ago by Tonyzeng300
2

What about vcftools? http://vcftools.sourceforge.net/

ADD REPLYlink written 7.3 years ago by Biomonika (Noolean)3.1k
3

Tony: For general question like this you should first go through similar questions in Biostar. You can easily search them using the search button. Only if you don't find a good or satisfying answer, you should post a question. Thanks.

ADD REPLYlink written 7.3 years ago by Ashutosh Pandey12k
1
gravatar for William
7.3 years ago by
William4.7k
Europe
William4.7k wrote:

GATK has a tool "SelectVariants" that has some standard filter options and you can create filter expressions based on the attributes in the vcf records: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_variantutils_SelectVariants.html

You can also use SnpSift to build filter expressions on the standards vcf attributes and the ones added by SnpEff effect prediction: http://snpeff.sourceforge.net/SnpSift.html

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by William4.7k
1
gravatar for Jeremy Leipzig
7.3 years ago by
Philadelphia, PA
Jeremy Leipzig19k wrote:

http://www.bioconductor.org/packages/2.12/bioc/html/VariantAnnotation.html

vcf objects are basically sample-subsettable granges - a very clever implementation

ADD COMMENTlink written 7.3 years ago by Jeremy Leipzig19k
1
gravatar for dangenet
7.3 years ago by
dangenet90
dangenet90 wrote:

If you want a quick and highly customizable way to filter vcfs, try perl one-liners.

perl -lne 'print $_ if ($_ =~ /0\/1/)' < my_vcf_file.vcf > filtered_vcf_file.vcf

will get you all the variants where the genotype has been called as "0/1". In English, this one-liner says "print the line if the line contains the string "0/1".

perl -lane 'print $F[5] if ($_ !~ /^#/)' < my_vcf_file.vcf > QUAL_scores.txt

will get you a list of all the QUAL scores. In English, this says "print the value in the sixth column if the line does not start with a # character".

My favorite perl one-liner guide is here. A one-liner is no replacement for a proper filtering script, but for getting a sense of the distribution of your data there's nothing better.

ADD COMMENTlink written 7.3 years ago by dangenet90
0
gravatar for Aaronquinlan
7.3 years ago by
Aaronquinlan11k
United States
Aaronquinlan11k wrote:

While it imports your VCF into a database first, our GEMINI software is specifically designed to allow filtering of variants in VCF files based on genome annotations and sample genotypes.

See Gemini: Integrative Exploration Of Genetic Variation And Genome Annotations thread. Also, please see the documentation.

An example of a GEMINI query filtering variants based on allele frequency and functional impact:

$ gemini query -q "select * from variants \
                  where is_lof = 1 \
                  and aaf >= 0.01" my.db

Extend this to further filter based on sample Thelonius being a heterozygote

$ gemini query -q "select * from variants \
                  where is_lof = 1 \
                  and aaf >= 0.01" 
         --gt-filter "gt_types.Thelonius == HET" \
         my.db
ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by Aaronquinlan11k
0
gravatar for Bioch'Ti
7.3 years ago by
Bioch'Ti1.0k
France (Avignon)
Bioch'Ti1.0k wrote:

Hi,

You can also look at the extension of Plink! that manages VCF files: http://atgu.mgh.harvard.edu/plinkseq/overview.shtml

Best

ADD COMMENTlink written 7.3 years ago by Bioch'Ti1.0k
0
gravatar for Pierre Lindenbaum
7.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

Filter using javascript: https://github.com/lindenb/jvarkit#-filtering-vcf-with-javascript-rhino-

/** prints a VARIATION if two samples at least
have a DP<200 */ 
function myfilterFunction()
    {
    var samples=header.genotypeSamples;
    var countOkDp=0;


    for(var i=0; i< samples.size();++i)
        {
        var sampleName=samples.get(i);
        if(! variant.hasGenotype(sampleName)) continue;
        var genotype = variant.genotypes.get(sampleName);
        if( ! genotype.hasDP()) continue;
        var dp= genotype.getDP();
        if(dp < 200 ) countOkDp++;
        }
    return (countOkDp>2)
    }
myfilterFunction();

.

$ gunzip -c file.vcf.gz |\
   java -jar  dist/vcffilterjs.jar  SCRIPT_FILE=filter.js
ADD COMMENTlink written 7.3 years ago by Pierre Lindenbaum133k
0
gravatar for Sean Davis
7.3 years ago by
Sean Davis26k
National Institutes of Health, Bethesda, MD
Sean Davis26k wrote:

The snpSift package has snpSift filter operation that is quite powerful and performant.

http://snpeff.sourceforge.net/SnpSift.html#filter

ADD COMMENTlink written 7.3 years ago by Sean Davis26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2309 users visited in the last hour
_