Filter multisample vcf file with SNPs from one individual?
1
0
Entering edit mode
8.3 years ago

We have whole genome resequenced strains (with Illumina) from a non-model yeast that are mapped to a reference genome (sequenced using PacBio), using BWA. From the Mapping files, we used GATK tools to do jointgenotyper variant calling followed by filtering. The type strain that was used for high-depth PacBio sequencing was also used for Illumina and we want to remove the SNPs that are common to this type strain and all the other resequenced strains so that we can ascertain only the SNPs that are unique to the resequenced population. The type strain is not related to the remaining [population, but we feel that the "artcifactual" SNPs from this strain are making the data noisy.

So simply, how do we remove SNP columns that are common to the resequences population and the type strain?

I have tried this with vcftools and plink and so far been unsuccessful.

Thanks in advance.

SNP next-gen sequencing genome • 1.9k views
ADD COMMENT
0
Entering edit mode

can we see a sample of your VCF please ? with the expected output.

ADD REPLY
0
Entering edit mode
8.3 years ago

The BBMap package has a tool called "comparevcf.sh" which can do various set operations on VCF files. For example:

comparevcf.sh in=strain.vcf,common.vcf out=unique.vcf subtract

...will give you variants in "strain" not in "common", while...

comparevcf.sh in=strain.vcf,common.vcf out=shared.vcf intersection

...will give you the variants shared with "common".

ADD COMMENT

Login before adding your answer.

Traffic: 3789 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6