How to delete two populations pseudo snps from a mutiple populations vcf file?
0
0
Entering edit mode
7.3 years ago

Hi all,

I used three populations to call snps by SAMtools.And filtered biallelic snps by vcftools, I need to calculate between populations's fst, the original vcf file was divided into 3 sub-vcf files, each sub-vcf only contains two populations. The variant sites exist in the original vcf for three populations may not exist in the sub-vcf for two populations, because when extract only two populations, some sites should be the same between the included two populations but different from the third population, they are variants in the original vcf, but not variant any more in the sub-vcf. So, my question is how to delete these non-variant sites in the sub-vcf file? These sites genotypes are the same homozygote 0/0 or 1/1.

Is there any scripts or some softwares could address this?

Thanks all!

Sincerely,

Dezhi

SNP genome sequencing snp • 1.7k views
ADD COMMENT
0
Entering edit mode

the original vcf file was divided into 3 sub-vcf files

why ? asking because all the information is already here. Why do you need to remove some samples ?

ADD REPLY
0
Entering edit mode

Hi Pierre,

If using mutlple populations to call SNPs, but only paired comparison are needed, there may have some sites are not SNPs anymore. And also, using multiple populations would omit biallelic SNPs only exist in the focal two populations, because these kind SNPs may become triallelic or tetra-allelic when other populations are taken into account. But seems like it doesn't matter from plentiful literatures.

This is my personal opinion, I don't know if this is correct, Your comments and criticism are greatly welcomed.

ADD REPLY

Login before adding your answer.

Traffic: 2908 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6