Question: how to count variants par sample per chromosome in a vcf file?
0
gravatar for nagarsaggi
5 months ago by
nagarsaggi0
nagarsaggi0 wrote:

I want to count number of the variants called on each chromosome for each sample from a multi sample vcf file. Any help would be really appropriated. Thanks Ram

snp • 609 views
ADD COMMENTlink modified 5 months ago by chrchang5234.6k • written 5 months ago by nagarsaggi0
1

Use vcfstats function from Rtgtools per sample stats

ADD REPLYlink modified 5 months ago • written 5 months ago by cpad011211k
1
gravatar for sacha
5 months ago by
sacha1.6k
France
sacha1.6k wrote:

Split your vcf file by sample and count how many times chromosom appear in each file .

FILE=yourfile.vcf
for sample in `bcftools query -l $FILE`
do 
      bcftools view -c1 -H -s $sample -o ${sample}.vcf $FILE
      cat ${sample}.vcf |cut -f1|uniq -c > ${sample}.count
done
ADD COMMENTlink modified 5 months ago • written 5 months ago by sacha1.6k
1
gravatar for JC
5 months ago by
JC7.4k
Mexico
JC7.4k wrote:

You can use a bash command like:

gunzip -c myfile.vcf.gz | grep -v "#" | cut -f1 | uniq -c

Explanation:

  • gunzip -c myfile.vcf.gz -> decompress the VCF
  • grep -v "#" -> removes all header lines
  • cut -f1 -> keeps only the first column (which is the chromosome id)
  • uniq -c -> counts unique elements in a list
ADD COMMENTlink written 5 months ago by JC7.4k

Thank JC for the prompt reply, but with this commend, I get same number of the variants on each chromosome across the samples. The commend counts all the chromosome sites where a variants is called across the samples which would be same for all the samples. What I want is the number of the variants called on each chromosome in a individual sample.

ADD REPLYlink written 5 months ago by nagarsaggi0

you can extract samples per individual with VCFtools, then use the same strategy to count called variants

ADD REPLYlink written 5 months ago by JC7.4k

Yes, I tried this but still get same number of variant per site for all individual samples.

ADD REPLYlink written 5 months ago by nagarsaggi0

across the samples.

this was not specified in your original question. How are we supposed to know it ?

I updated the question.

ADD REPLYlink written 5 months ago by Pierre Lindenbaum116k
1
gravatar for Pierre Lindenbaum
5 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

A one liner using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

$ java -jar bioalcidaejdk.jar  -e 'stream().flatMap(V->V.getGenotypes().stream().filter(G->G.isCalled()&&!G.isHomRef()).map(G->V.getContig()+"\t"+G.getSampleName())).collect(Collectors.groupingBy(Function.identity(),Collectors.counting())).forEach((K,C)->println(K+":"+C));'  otavirus_rf.vcf.gz

RF07    S4:1
RF07    S3:2
RF05    S2:4
RF07    S2:2
RF11    S4:1
RF03    S5:2
RF09    S1:1
RF09    S2:1
RF01    S5:1
RF09    S3:1
RF03    S2:4
RF03    S1:1
RF09    S5:1
RF05    S3:4
RF03    S4:2
RF07    S5:1
RF05    S4:2
RF03    S3:4
RF06    S3:1
RF06    S2:1
RF04    S1:3
RF06    S1:1
RF10    S4:2
RF10    S1:1
RF02    S4:1

details:

stream(). /* get a stream of variant */
    flatMap( 
        V->V.getGenotypes(). /* get the genotypes */
            stream(). /* convert to a stream of genotypes */
            filter(G->G.isCalled() && !G.isHomRef()). /* discard NO_CALL && HOM_REF */
            map(G->V.getContig()+"\t"+G.getSampleName()) /* convert to key=contig+sample */
        ).
    collect( 
        Collectors.groupingBy(Function.identity(), Collectors.counting()) /* count each key */
        ).forEach((K,C)->println(K+" : "+C)); /* print */
ADD COMMENTlink written 5 months ago by Pierre Lindenbaum116k
0
gravatar for prasundutta87
5 months ago by
prasundutta87310
prasundutta87310 wrote:

BCFtools stats with any one of the below two options is what you need..

 -s, --samples <list>               list of samples for sample stats, "-" to include all samples
 -S, --samples-file <file>          file of samples to include

Example:

bcftools stats -s - <multisample VCF file>
ADD COMMENTlink modified 5 months ago • written 5 months ago by prasundutta87310
0
gravatar for chrchang523
5 months ago by
chrchang5234.6k
United States
chrchang5234.6k wrote:

(deleted, initially missed the fact that hom-ref calls are not wanted here)

ADD COMMENTlink modified 5 months ago • written 5 months ago by chrchang5234.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1864 users visited in the last hour