Question: Which Is The Best Way For Filtering A Vcf File For Only Variation Sites?
0
gravatar for dapregi
6.5 years ago by
dapregi50
Spain
dapregi50 wrote:

When creating VCF files with bcftools I can obtain only the variant positions using -v option. But when I have a VCF file with all postions and I want to filter only for the ones with variations, which is the most appropiate option in vcftools?

I could do

 bcftools -Scvg  allpositions.vcf.gz > only_variants.vcf

But recalling the SNVs and genotypes seems a waste of CPU cycles to me.

I am not sure if it would be better to use any of these options in vcftools:

--non-ref-ac <float>
--max-non-ref-ac <float>
      Include only sites with all Non-Reference Allele Counts within the specified range.

Or:

--min-alleles <int>
--max-alleles <int>
     Include only sites with a number of alleles within the specified range. For example, to include only bi-allelic sites, one could use:
     vcftools --vcf file1.vcf --min-alleles 2 --max-alleles 2

Any help would be appreciated, and sorry if this has been asked before but I am not finding an answer for it.

vcf bcftools vcftools • 4.7k views
ADD COMMENTlink modified 6.5 years ago by swbarnes28.2k • written 6.5 years ago by dapregi50
2

what do you mean exactly with "only variation sites"?

ADD REPLYlink written 6.5 years ago by Giovanni M Dall'Olio27k

Sites where an alternative allele has been found and genotype is "0/1" or "1/1". This is for single sample vcf files. When I have large multisample vcf I apply another filter like 'remove all-homozygote positons' to deal with reference alleles that are indeed variants not seen in our population.

ADD REPLYlink written 6.5 years ago by dapregi50
4
gravatar for swbarnes2
6.5 years ago by
swbarnes28.2k
United States
swbarnes28.2k wrote:

What's wrong with grepping, getting all lines with '0/1' and '1/1'?

ADD COMMENTlink written 6.5 years ago by swbarnes28.2k
1

There is nothing wrong in doing that, and it is what I usually do, but I was wondering if vcftools has a better/faster way of doing it.

ADD REPLYlink written 6.5 years ago by dapregi50
1

you should also grep for "0|1" and "1|1", to get phased genotypes.

ADD REPLYlink written 6.5 years ago by Giovanni M Dall'Olio27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1506 users visited in the last hour