Which Is The Best Way For Filtering A Vcf File For Only Variation Sites?
1
0
Entering edit mode
10.2 years ago
dapregi ▴ 50

When creating VCF files with bcftools I can obtain only the variant positions using -v option. But when I have a VCF file with all postions and I want to filter only for the ones with variations, which is the most appropiate option in vcftools?

I could do

 bcftools -Scvg  allpositions.vcf.gz > only_variants.vcf

But recalling the SNVs and genotypes seems a waste of CPU cycles to me.

I am not sure if it would be better to use any of these options in vcftools:

--non-ref-ac <float>
--max-non-ref-ac <float>
      Include only sites with all Non-Reference Allele Counts within the specified range.

Or:

--min-alleles <int>
--max-alleles <int>
     Include only sites with a number of alleles within the specified range. For example, to include only bi-allelic sites, one could use:
     vcftools --vcf file1.vcf --min-alleles 2 --max-alleles 2

Any help would be appreciated, and sorry if this has been asked before but I am not finding an answer for it.

vcftools bcftools vcf • 7.2k views
ADD COMMENT
2
Entering edit mode

what do you mean exactly with "only variation sites"?

ADD REPLY
0
Entering edit mode

Sites where an alternative allele has been found and genotype is "0/1" or "1/1". This is for single sample vcf files. When I have large multisample vcf I apply another filter like 'remove all-homozygote positons' to deal with reference alleles that are indeed variants not seen in our population.

ADD REPLY
4
Entering edit mode
10.2 years ago

What's wrong with grepping, getting all lines with '0/1' and '1/1'?

ADD COMMENT
1
Entering edit mode

There is nothing wrong in doing that, and it is what I usually do, but I was wondering if vcftools has a better/faster way of doing it.

ADD REPLY
1
Entering edit mode

you should also grep for "0|1" and "1|1", to get phased genotypes.

ADD REPLY

Login before adding your answer.

Traffic: 2339 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6