Question: How to check which samples has more uncalled genotypes in multi-sample vcf
gravatar for BAGeno
20 months ago by
BAGeno170 wrote:


I have multi-sample vcf and in this vcf, there are many sites which have uncalled or missing genotype. Is there a way to check which sample has greater number of uncalled genotypes in vcf. So that I can exclude that sample from further analysis.

genotype sample missing • 570 views
ADD COMMENTlink modified 20 months ago by Pierre Lindenbaum127k • written 20 months ago by BAGeno170

Hello BAGeno,

see my answer in this thread. You just have to adopt the genotype in the awk script or if it's a small file and speed doesn't matter this more easy one.

fin swimmer

ADD REPLYlink modified 20 months ago • written 20 months ago by finswimmer13k
gravatar for Pierre Lindenbaum
20 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:

A one liner using bioalcidaejdk:

$ java -jar dist/bioalcidaejdk.jar -e 'stream().flatMap(G->G.getGenotypes().stream()).filter(G->!G.isCalled()).map(G->G.getSampleName()).collect(Collectors.groupingBy(Function.identity(), Collectors.counting())).forEach((K,V)->println(K+"\t"+V));' src/test/resources/test_vcf01.vcf  | sort -t $'\t' -k2,2n

S3  8
S4  9
S5  14
S6  18
S2  23
S1  73
  • stream().get a stream of variants
  • flatMap(G->G.getGenotypes().stream()). map to a stream of genotypes
  • filter(G->!G.isCalled()). keep the uncalled genotype
  • map(G->G.getSampleName()). map to the sample name
  • collect(Collectors.groupingBy(Function.identity(), Collectors.counting())) convert to associative array sample/count
  • .forEach((K,V)->println(K+"\t"+V)); print the results.
ADD COMMENTlink modified 20 months ago • written 20 months ago by Pierre Lindenbaum127k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1916 users visited in the last hour