Entering edit mode
7.1 years ago
William
★
5.3k
Plink and VCFTools can be used create missingness per sample (.imiss) and per variants (.lmiss) files from VCF files.
https://www.cog-genomics.org/plink2/basic_stats#missing
http://vcftools.sourceforge.net/man_latest.html (--missing-indv, --missing-site)
Example first lmiss file records:
CHR SNP N_MISS N_GENO F_MISS
chr_1 id_1 10 100 0.10
chr_1 id_2 20 100 0.20
chr_1 id_2 30 100 0.30
potentially millions more variants
Example first imiss file recods
FID IID MISS_PHENO N_MISS N_GENO F_MISS
Sample_1 Sample_1 Y 5000 100000 0.05
Sample_2 Sample_2 Y 10000 100000 0.10
Sample_3 Sample_3 Y 15000 100000 0.15
Sample_4 Sample_4 Y 20000 100000 0.20
Sample_5 Sample_5 Y 50000 100000 0.50
potentially thousands more samples
I am probably not the first (or last) person who would like to plot this data to get an idea of the generall missingniss of his data.
Anyone a good existing (python) script for this?