Question: Filtering VCFs and Phasing
gravatar for miles.thorburn
2.1 years ago by
miles.thorburn110 wrote:

I have been trying to phase 66 genomes that are all contained in chromosome specific VCF files using the software ShapeIt. I have a working pipeline (works if I use the --force command to override the error I will discuss).

I get the following error:

33mERROR: 15611 SNPs with high rates of missing data (>10%). These sites should be removed.

First I tried to use Plink to remove these SNPs, but the resulting VCF had seemingly lost a lot of information. I've since deleted the script, but I could probably figure out what I did if necessary.

Second I found VCFtools could remove the SNPs too. I used the following code;

vcftools --vcf $file --max-missing 0.1 --recode --recode-INFO-all --out $OUTDIR/"$newname"

This step only removes a few hundred SNPs, and the error message from ShapeIt indicates that 15461 of the missing data SNPs are still present. Have I misinterpreted the VCFtools manual, missed a parameter, or approached the problem incorrectly?

Thank you in advance for your help. I am still learning a lot as I go, and bioinformatics is certainly not my forte.

snp vcftools filtering vcf shapeit • 775 views
ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by miles.thorburn110
gravatar for miles.thorburn
2.1 years ago by
miles.thorburn110 wrote:

So I figured it out. Turns out it was simply a misunderstanding of the parameters. inthe VCFTools step, --max-missing needs to be higher than 0.9 (I used 0.95 in the end). I believe this means only variants with a maximum of 5% missing information were allowed to be kept. After testing a few different parameters, using +0.9 the number of SNPs in each chromosome matched the number of SNPs reported as data deficient for phasing.

ADD COMMENTlink written 2.1 years ago by miles.thorburn110

Yes, that is correct. So, selecting 0.9 for --max-missing means that only variants that appear in 90% of your samples will be included. The name of this parameter does not do justice to its actual usage.

Please feel free top accept your own answer (I have already up-voted it).

ADD REPLYlink written 2.1 years ago by Kevin Blighe69k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2449 users visited in the last hour