How to calculate the number of SNPs in each sample in a multi vcf file
1
0
Entering edit mode
9 weeks ago
kmkdesilva ▴ 90

Hi everyone,

I have a multi vcf file with 23 samples generated from gatk pipeline. Now I want to find how many SNPs are in each sample (sum of 1/1,0/1 SNPs for each sample) Can someone please tell me how to do this?

Thank you

vcf gatk multi SNP • 378 views
1
Entering edit mode
9 weeks ago

bcftools stats --samples '-' in.vcf.gz

0
Entering edit mode

Thank you Pierre.

Following is a potion of the output produced by bcftools stats for my muti vcf file. The sum of column [5]nNonRefHom and [6]nHets gives the number of SNPs in each sample.

I wonder whether the [10]average depth value is equal to the average depth of coverage we will find in the bam files of the respective samples. Please let me know if you have an idea about the column [10]average depth

# PSC, Per-sample counts. Note that the ref/het/hom counts include only SNPs, for indels see PSI. The rest include both SNPs and indels.

# PSC   [2]id   [3]sample       [4]nRefHom      [5]nNonRefHom   [6]nHets        [7]nTransitions [8]nTransversions       [9]nIndels     [10]average depth     [11]nSingletons [12]nHapRef     [13]nHapAlt     [14]nMissing

PSC     0       3517    72867328        1946700 3767477 3734562 1897030 732106  20.7    340848  0       0       336589
PSC     0       3519    72781946        1910019 3885015 3792473 1919999 737150  22.6   351755  0       0       336253
PSC     0       683610  74080947        1261182 3399064 3048662 1552180 569521  17.9    107872  0       0      346298
PSC     0       686521  74340225        1224973 3245773 2918053 1489168 556105  18.5    95907   0       0       288281