Count missing and non missing values in a VCF file
2
0
Entering edit mode
18 months ago
LDT ▴ 330

Dear all,

I have a vcf with 100s of populations, and I want to count how many missing and non-missing values I have in each sample, and then for each variant for all samples.

Does anyone have a tip on how to calculate these stats?

thank you for your time

vcf bcftools vcftools • 2.0k views
ADD COMMENT
1
Entering edit mode

how to calculate %of missing genotype in vcf file - much easier to use plink2 --missing

ADD REPLY
2
Entering edit mode
18 months ago
iraun 6.2k

Maybe How to calculate the number of SNPs in each sample in a multi vcf file helps?

For the missing variants, I have created this simple awk script to count the number of ./. genotypes per sample, let me know if it works.

NR==1 {
    for (i=10; i<=NF; i++) {
        f[i] = $i
    }
}
{
    for(i=10; i<=NF; i++) {
        if ($i ~ /\.\/\./){
            b[i]++}}
    }
END {
    for(i=10; i<=NF; i++)
        printf "%s %s\n", f[i], b[i]
}

Save the previous code in a file named sc.awk (for example), and run it using:

awk -f sc.awk multi_sample.vcf
ADD COMMENT
0
Entering edit mode

Yes, Iraun. That's great! Any ideas on how to calculate the missing values for each variant? thank so much for your time

ADD REPLY
0
Entering edit mode

I have edited my question with a potential solution for the missing genotypes per sample (you said in your previous comment "for each variant", but I think you mean for each sample, right?

Btw the script is adaptable, if you replace /\.\/\./ by /1\/1/, it will count the number of homozygous mutations, 0/1 for the heterozygous, etc.

ADD REPLY
0
Entering edit mode
18 months ago
raphael.B ▴ 520

I think bcftools stats -S samples.txt file.vcf provides this kind of information among others

ADD COMMENT

Login before adding your answer.

Traffic: 1535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6