Question: Filtering VCFs and Phasing
0
gravatar for miles.thorburn
11 weeks ago by
miles.thorburn60 wrote:

I have been trying to phase 66 genomes that are all contained in chromosome specific VCF files using the software ShapeIt. I have a working pipeline (works if I use the --force command to override the error I will discuss).

I get the following error:

33mERROR: 15611 SNPs with high rates of missing data (>10%). These sites should be removed.

First I tried to use Plink to remove these SNPs, but the resulting VCF had seemingly lost a lot of information. I've since deleted the script, but I could probably figure out what I did if necessary.

Second I found VCFtools could remove the SNPs too. I used the following code;

vcftools --vcf $file --max-missing 0.1 --recode --recode-INFO-all --out $OUTDIR/"$newname"

This step only removes a few hundred SNPs, and the error message from ShapeIt indicates that 15461 of the missing data SNPs are still present. Have I misinterpreted the VCFtools manual, missed a parameter, or approached the problem incorrectly?

Thank you in advance for your help. I am still learning a lot as I go, and bioinformatics is certainly not my forte.

snp vcftools filtering vcf shapeit • 192 views
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by miles.thorburn60
3
gravatar for miles.thorburn
11 weeks ago by
miles.thorburn60 wrote:

So I figured it out. Turns out it was simply a misunderstanding of the parameters. inthe VCFTools step, --max-missing needs to be higher than 0.9 (I used 0.95 in the end). I believe this means only variants with a maximum of 5% missing information were allowed to be kept. After testing a few different parameters, using +0.9 the number of SNPs in each chromosome matched the number of SNPs reported as data deficient for phasing.

ADD COMMENTlink written 11 weeks ago by miles.thorburn60

Yes, that is correct. So, selecting 0.9 for --max-missing means that only variants that appear in 90% of your samples will be included. The name of this parameter does not do justice to its actual usage.

Please feel free top accept your own answer (I have already up-voted it).

ADD REPLYlink written 11 weeks ago by Kevin Blighe39k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1969 users visited in the last hour