Count number of variants per sample
1
0
Entering edit mode
2.2 years ago

Hello!

I have two VCF files as output of the gatk pipeline, one for including the snps and one the indels. I would like to count the number of snps and the number of indels per sample. The VCF files correspond to 30 samples, from which I am interested in 11, that I have listed in my_samples_list.txt.

My code:

while read sample_name;
do
        n_indels=$(bcftools view -s $sample_name recal_indels.vcf | grep -v "^#" | wc -l)
        n_snps=$(bcftools view -s $sample_name recal_snps.vcf | grep -v "^#" | wc -l)
        printf "%s\t%s\t%s\n" "$sample_name" "$n_snps" "$n_indels" >> n_of_variants.txt
done < ./my_samples_list.txt

Desired output:

sample_name #_snps #_indels
HG00321 111 222
HG00323 333 444
vcf bcftools gatk • 977 views
ADD COMMENT
1
Entering edit mode
2.2 years ago
$ bcftools stats --samples - input.vcf.gz | grep PSC

# PSC, Per-sample counts. Note that the ref/het/hom counts include only SNPs, for indels see PSI. The rest include both SNPs and indels.
# PSC   [2]id   [3]sample   [4]nRefHom  [5]nNonRefHom   [6]nHets    [7]nTransitions [8]nTransversions   [9]nIndels  [10]average depth   [11]nSingletons [12]nHapRef [13]nHapAlt [14]nMissing
PSC 0   S1  36  2   5   3   4   2   0.0 9   0   0   0
PSC 0   S2  30  8   7   1   14  0   0.0 0   0   0   0
PSC 0   S3  30  8   7   1   14  0   0.0 0   0   0   0
PSC 0   S4  31  6   5   4   7   3   0.0 13  0   0   0
PSC 0   S5  37  8   0   2   6   0   0.0 8   0   0   0
ADD COMMENT

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6