How to plot VCF missingness per sample (.imiss) and per variant (.lmiss) in python?
0
0
Entering edit mode
7.1 years ago
William ★ 5.3k

Plink and VCFTools can be used create missingness per sample (.imiss) and per variants (.lmiss) files from VCF files.

https://www.cog-genomics.org/plink2/basic_stats#missing

http://vcftools.sourceforge.net/man_latest.html (--missing-indv, --missing-site)

Example first lmiss file records:

CHR      SNP        N_MISS   N_GENO   F_MISS
chr_1    id_1       10       100      0.10
chr_1    id_2       20       100      0.20
chr_1    id_2       30       100      0.30
potentially millions more variants

Example first imiss file recods

   FID       IID       MISS_PHENO   N_MISS   N_GENO   F_MISS
  Sample_1   Sample_1  Y            5000     100000   0.05
  Sample_2   Sample_2  Y            10000    100000   0.10
  Sample_3   Sample_3  Y            15000    100000   0.15
  Sample_4   Sample_4  Y            20000    100000   0.20
  Sample_5   Sample_5  Y            50000    100000   0.50
  potentially thousands more samples

I am probably not the first (or last) person who would like to plot this data to get an idea of the generall missingniss of his data.

Anyone a good existing (python) script for this?

qc plink vcftools vcf • 3.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2077 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6