Question: Filter multisample vcf file with SNPs from one individual?
gravatar for pacbioslaves
22 months ago by
pacbioslaves0 wrote:

We have whole genome resequenced strains (with Illumina) from a non-model yeast that are mapped to a reference genome (sequenced using PacBio), using BWA. From the Mapping files, we used GATK tools to do jointgenotyper variant calling followed by filtering. The type strain that was used for high-depth PacBio sequencing was also used for Illumina and we want to remove the SNPs that are common to this type strain and all the other resequenced strains so that we can ascertain only the SNPs that are unique to the resequenced population. The type strain is not related to the remaining [population, but we feel that the "artcifactual" SNPs from this strain are making the data noisy.

So simply, how do we remove SNP columns that are common to the resequences population and the type strain?

I have tried this with vcftools and plink and so far been unsuccessful.

Thanks in advance.

sequencing snp next-gen genome • 679 views
ADD COMMENTlink modified 22 months ago by Brian Bushnell16k • written 22 months ago by pacbioslaves0

can we see a sample of your VCF please ? with the expected output.

ADD REPLYlink written 22 months ago by Pierre Lindenbaum119k
gravatar for Brian Bushnell
22 months ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

The BBMap package has a tool called "" which can do various set operations on VCF files. For example: in=strain.vcf,common.vcf out=unique.vcf subtract

...will give you variants in "strain" not in "common", while... in=strain.vcf,common.vcf out=shared.vcf intersection

...will give you the variants shared with "common".

ADD COMMENTlink modified 22 months ago • written 22 months ago by Brian Bushnell16k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 791 users visited in the last hour