setting the missing genotypes as 0/0
1
0
Entering edit mode
7 months ago

I have two vcf files from different project. One has 200 animals, the latter has 4000 animals. Before merging them into one file, I just realized the both vcf files have some missing genotypes. It is shown as ./.

Bcftools merge has an option to set the missing genotypes as 0/0. I am not completely sure if I should use this option or not. In next step I will add annotation and split the merged file to the seperate vcf files for each populations. Last I will look at the allele frequency of populations. Does 0/0 affect the AF calculation?

Thanks for the suggestions in advance!

missing genotypes vcf merge bcftools • 227 views
ADD COMMENT
0
Entering edit mode
7 months ago
4galaxy77 ★ 1.0k

Yes setting missing SNPs to "0/0" in the merge will affect the allele frequency calculations. This option is useful if you know that you've previously removed positions where everyone was "0/0", so when you merge them back, you will correctly fill in the "0/0"s.

However, if the genotypes are sporadically missing in the samples, then you should leave them as missing, since they may have been removed due to e.g. poor quality, and you can't assume they were "0/0". I would reccomend doing this as the 'safer' option, unless you specifically know the missing genotypes are all "0/0".

ADD COMMENT
0
Entering edit mode

Thanks for the response!

I used this to merge my files by setting the missing genotypes as 0/0

bcftools merge -m none -0 file1.vcf.gz file2.vcf.gz > mergedfile.vcf

Both vcf files also contain unshared variants. I guess it will also create missing genotypes in the merged file. In that case, what should I use?

ADD REPLY
1
Entering edit mode

If they contain unshared variants between the two, then don't set the missing genotypes to "0/0". Imagine if you have a SNP genotyped in one set and not the other - when you merge them together, then it will automatically fill in the missing genotypes as "0/0" - but that's just guessing and half of them might be wrong! Best just to leave them as missing if you are calculating allele frequencies.

ADD REPLY

Login before adding your answer.

Traffic: 1470 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6