Genotype count in a VCF file
4
0
Entering edit mode
7.5 years ago
akang ▴ 110

From a VCF file how can i count how many samples have 0/0, 0/1, 1/1 genotypes on a given locus? Ill appreciate any help!

SNP vcf • 7.1k views
ADD COMMENT
2
Entering edit mode
7.5 years ago
sacha ★ 2.4k

The following command counts occurence of genotype in a VCF file.

  cat test.vcf|grep -oE "([0-2]/[0-2])"|sort | uniq -c

You can filter for a specific locus before :

  cat test.vcf|awk '$1==CHROM && $2==POS {print $0}' | grep -oE "([0-2]/[0-2])"|sort | uniq -c
ADD COMMENT
0
Entering edit mode

I think the solution provided would tell total 0/1 or 1/0 count per vcf file. He wants how many samples have 0/0, 0/1, 1/1 genotypes on a given locus. So suppose there are 10 samples in a vcf file, then there would be 10 genotypes at each locus corresponding to 10 sample. So 0/0, 0/1, 1/1 distribution has to be wrt 10 samples.

ADD REPLY
0
Entering edit mode

It does work (it summarizes) but I assume OP wanted counts per line/variant.

ADD REPLY
0
Entering edit mode

could it be possible to print the sample names as well?

ADD REPLY
2
Entering edit mode
7.5 years ago

here is a simple perl alternative:

zcat ALL.chr7.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
| perl -ane '
/^#/ and next;
%c = ();
foreach (@F[9..$#F]) { /^([^:]+)/ and $c{$1}++ }
print "$F[0]\t$F[1]";
foreach $gt (sort keys %c) { print "\t$gt:$c{$gt}" }
print "\n"
'
ADD COMMENT
1
Entering edit mode
7.5 years ago

I have not tried. But probably it will work. Let me know if it does not work.

grep -v '^#' temp.vcf | perl -lne '@x=split /\s+/,$_;$var11=grep{/1\/1/}@x[9..$#x]; $var01=grep{/0\/1/}@x[9..$#x];$var10=grep{/1\/0/}@x[9..$#x];print "$x[0]\t$x[1]\t$var01\t$var10\t$var11\n"'

Thanks

Priyabrata

Persistent LABS

ADD COMMENT
0
Entering edit mode

Although I haven't checked whether it's correct, it does work ;)

ADD REPLY
1
Entering edit mode
7.5 years ago

using bioalcidae: https://github.com/lindenb/jvarkit/wiki/BioAlcidae

cat ILLUMINA.wex.broad_phase2_baseline.20111114.both.exome.genotypes.1000.vcf | 
java -jar src/jvarkit/dist/bioalcidae.jar -F VCF -e 'while(iter.hasNext()) {var ctx = iter.next();m={};for(var i=0;i< ctx.getNSamples();++i) {var g=ctx.getGenotype(i);var t=g.getType().name();if(t in m) {m[t]++;} else {m[t]=1;};} out.println(ctx.getContig()+" "+ctx.getStart()+" "+JSON.stringify(m)); }'

1 69453 {"NO_CALL":748,"HOM_REF":426,"HOM_VAR":3,"HET":5}
1 69486 {"NO_CALL":747,"HOM_REF":434,"HET":1}
1 69496 {"NO_CALL":749,"HOM_REF":429,"HET":4}
1 69511 {"NO_CALL":693,"HOM_VAR":392,"HET":72,"HOM_REF":25}
1 69534 {"NO_CALL":755,"HOM_REF":426,"HET":1}
1 69541 {"NO_CALL":761,"HOM_REF":420,"HET":1}
1 69552 {"NO_CALL":757,"HOM_REF":419,"HET":6}
1 69590 {"NO_CALL":776,"HOM_REF":404,"HET":2}
1 69610 {"NO_CALL":783,"HOM_REF":392,"HET":6,"HOM_VAR":1}
1 69635 {"NO_CALL":789,"HOM_REF":391,"HET":2}
1 69761 {"NO_CALL":886,"HOM_REF":276,"HET":13,"HOM_VAR":7}
1 69768 {"NO_CALL":911,"HOM_REF":270,"HET":1}
1 69897 {"NO_CALL":999,"HOM_VAR":131,"HET":24,"HOM_REF":28}
1 69968 {"NO_CALL":1152,"HOM_REF":29,"HET":1}
1 861275 {"HOM_REF":1181,"HET":1}
1 861292 {"HOM_REF":1179,"HET":3}
ADD COMMENT
0
Entering edit mode

Hi Pierre, could we print the sample names also in separate columns corresponding to the counts? BTW, i could not find the jar in the git, is it moved to a different location or name?

ADD REPLY
0
Entering edit mode

, i could not find the jar in the git,

you need to compile the program.

could we print the sample names also in separate columns corresponding to the counts

yes , just loop over the m object. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/keys

ADD REPLY
0
Entering edit mode

I tried getSampleName() in the print statement. It won't do what i need, it is just printing one name for all the variants. Seems there is a little more complicated solution that needs some intuition. Could you help.

out.println(ctx.getContig()+" "+ctx.getStart()+" "+JSON.stringify(m)+" "+g.getSampleName())
ADD REPLY

Login before adding your answer.

Traffic: 2553 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6