Question: Remove invariant sites from a VCF file
gravatar for aberry814
3.4 years ago by
United States
aberry81470 wrote:

I have a VCF file with ~30K sites across 131 samples. I am trying to make it include only variant sites, meaning I want to exclude loci where all of my 131 samples have the same genotype, regardless of what the reference allele is. I used GATK SelectVariants with the -env tag, but that only excludes sites where all samples are 0/0, not sites where all samples are 1/1 (homozygous reference.)

I am a pretty terrible coder and struggle to modify VCF files.

My question is: Does anybody have a script or know of a tool that can remove the entire site (line) if all 131 samples (columns?) have 1/1 in the genotype position? Or more generally, if all samples have the same genotype at that site, whether it be 0/0, 0/1, or 1/1 (GATK can do the 0/0 and 0/1, but if it's easier to kill 3 birds with one stone then no problem).



snp vcf • 2.9k views
ADD COMMENTlink modified 3.4 years ago by Pierre Lindenbaum131k • written 3.4 years ago by aberry81470
gravatar for Pierre Lindenbaum
3.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

using VCFFilterjs:

 java -jar dist/vcffilterjs.jar -e 'function accept(v) {var g0= v.getGenotype(0);for(var i=1;i< v.getNSamples();i++) {if(!v.getGenotype(i).sameGenotype(g0)) return true;} return false;}accept(variant);'  input.vcf
ADD COMMENTlink written 3.4 years ago by Pierre Lindenbaum131k

Thanks so much! It seems to have worked perfectly.

ADD REPLYlink written 3.4 years ago by aberry81470

Hi Pierre! I am using your solution to get rid of the same sites as aberry814, but this does not seem to eliminate the positions for which all genotyped individuals are 1/1, 0/0 or 0/1 AND some individuals have missing data. I guess a simple modification could do it?

Thanks a lot in advance!


ADD REPLYlink written 2.3 years ago by B.MartinezCruz0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1268 users visited in the last hour