Filtering VCFs and Phasing
1
0
Entering edit mode
5.3 years ago
dthorbur ★ 1.9k

I have been trying to phase 66 genomes that are all contained in chromosome specific VCF files using the software ShapeIt. I have a working pipeline (works if I use the --force command to override the error I will discuss).

I get the following error:

33mERROR: 15611 SNPs with high rates of missing data (>10%). These sites should be removed.

First I tried to use Plink to remove these SNPs, but the resulting VCF had seemingly lost a lot of information. I've since deleted the script, but I could probably figure out what I did if necessary.

Second I found VCFtools could remove the SNPs too. I used the following code;

vcftools --vcf $file --max-missing 0.1 --recode --recode-INFO-all --out $OUTDIR/"$newname"

This step only removes a few hundred SNPs, and the error message from ShapeIt indicates that 15461 of the missing data SNPs are still present. Have I misinterpreted the VCFtools manual, missed a parameter, or approached the problem incorrectly?

Thank you in advance for your help. I am still learning a lot as I go, and bioinformatics is certainly not my forte.

ShapeIt VCFtools filtering VCF SNP • 2.5k views
ADD COMMENT
4
Entering edit mode
5.3 years ago
dthorbur ★ 1.9k

So I figured it out. Turns out it was simply a misunderstanding of the parameters. inthe VCFTools step, --max-missing needs to be higher than 0.9 (I used 0.95 in the end). I believe this means only variants with a maximum of 5% missing information were allowed to be kept. After testing a few different parameters, using +0.9 the number of SNPs in each chromosome matched the number of SNPs reported as data deficient for phasing.

ADD COMMENT
0
Entering edit mode

Yes, that is correct. So, selecting 0.9 for --max-missing means that only variants that appear in 90% of your samples will be included. The name of this parameter does not do justice to its actual usage.

Please feel free top accept your own answer (I have already up-voted it).

ADD REPLY

Login before adding your answer.

Traffic: 2960 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6