Splitting VCF by sample: different files but same SNPs number
1
1
Entering edit mode
3.2 years ago
greed ▴ 10

Hi there. I'm working with VCF files and I've noticed a peculiarity: using the following commands: bcftools view -S sample_A_list.txt input.vcf > sample_A.vcf and bcftools view -S sample_B_list.txt input.vcf > sample_B.vcf, I've created these 2 VCF files deriving from input.vcf, splitting it into 2 different samples (A and B). sample_A.vcf has 27 samples and sample_B.vcf has 116 samples. Now if I run egrep -v "^#"sample_A.vcf | wc -l or egrep -v "^#"sample_B.vcf | wc -l in order to have the number of SNPs for each VCF, I collect the same result: 5997 SNPs for both files. Then I pruned the SNPs in linkage using the plink pipeline (R 0.4) in order to get 2 new VCF via the recode function. Running the same bash command to get the total number of SNPs post pruning I get values that are totally different: sample_B.vcf (with 116 samples) post LD-pruning passes from 5997 SNPs to something more than 2000, while sample_A.vcf (with 27 samples) post LD-pruning passes from 5997 SNPs to something more than 500. So the first question is: if I have 2 different VCFs, for which reason I've got the same SNPs total number? The second questions is: for what reason I have this huge difference in SNPs number post LD-pruning between my 2 files? Thank you for the answers and the help.

VCF bcftools GWAS SNPs plink • 936 views
ADD COMMENT
1
Entering edit mode
3.2 years ago

if I have 2 different VCFs, for which reason I've got the same SNPs total number?

because bcftools view --sample SAMPLEX will not remove the variant if the only genotype (for SAMPLEX ) is homozygous on REF (0/0). You need an extra step like

bcftools view -i 'AC>0' one_sample.vcf
ADD COMMENT
0
Entering edit mode

Ok, so if I correctly understand I have to do this step for each VCF (sample_A.vcf and sample_B.vcf) generated from view function.

ADD REPLY

Login before adding your answer.

Traffic: 2727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6