Hi all,
I want to count the number of SNP for each chromosome in the raw VCF file. What is the best idea?
Best Regard
Mostafa
Hi all,
I want to count the number of SNP for each chromosome in the raw VCF file. What is the best idea?
Best Regard
Mostafa
UPDATE 2021: if your VCF is indexed: bcftools index -s indexed.vcf.gz
grep -v "^#" in.vcf | cut -f 1 | sort | uniq -c
Pierre's script works for me, Moustafa:
grep -v "^#" test.vcf | cut -f 1 | sort | uniq -c
16011 1
7308 10
9565 11
9149 12
3311 13
5881 14
5360 15
7016 16
8611 17
2896 18
9895 19
11621 2
3881 20
2472 21
3881 22
9215 3
7464 4
7805 5
10110 6
7991 7
6023 8
6898 9
37 MT
3218 X
21 Y
Chromosome 1 has 16011 variants... chromosome 9 has 6898, et cetera.
Your input VCF should be properly formatted and also be uncompressed.
try VCFstats from RTGtools. But that would be stats per sample, not per chromosome. If you want per chromosome, per sample, then you may have to write a script. mostafarafiepour
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Normalize your VCF and then execute: Datamash is in most of the linux repos
with awk:
I have adapted your title to make it more descriptive of what you are asking.