I have an annotated vcf of 24 samples, and would like to get a table of the number of variants per gene per sample. Is there a way to do that which has been tried before?
(java -Xmx4g -jar snpEff/snpEff.jar count -v hg19 all_multi.vcf > test_count.txt
) only produces counts and not per sample information.
While:
bcftools query -i'GT="1"' -f'[%SAMPLE %GT \n]' all_multi.vcf.gz | awk '{print $1}' | sort | uniq -c > sample.counts
Doesn't produce anything.
Any ideas/suggestions would be much appreciated. Thanks in advance!
Thank you, that seems to work but I seem to have a problem with duplicates, which is causing errors in any downstream analysis or data wrangling. Please see the example below
Appreciating your suggestions in advance!
OK course, there is more than one transcript per gene.
Oh of course, completely missed that bit! Thanks again for your help.