how to estimate heterozygosity by sample from a multi-sample vcf file
3
1
Entering edit mode
4.2 years ago
nagarsaggi ▴ 40

Hi All, I want to estimate heterozygosity by samples from a multisample vcf file. I mapped whole-genome short reads data of many samples to a reference genome and did joint variants calling using freebayes which outputted a multisample vcf file. I would like to calculate % heterozygosity for each sample. If you know a good script or a tool that do it, please share with me.

Many Thanks  Ram

snp • 4.3k views
ADD COMMENT
4
Entering edit mode
4.2 years ago

plink2 --vcf <VCF filename> --sample-counts cols=hom,het

reports the number of homozygous and heterozygous calls for each sample; see https://www.cog-genomics.org/plink/2.0/basic_stats#sample_counts for more details.

ADD COMMENT
1
Entering edit mode
4.2 years ago
trausch ★ 1.9k
bcftools stats -s - <input.vcf.gz> | grep "^PSC"-B 1
ADD COMMENT
0
Entering edit mode
4.2 years ago

using bioalcidaejdk: http://lindenb.github.io/jvarkit/BioAlcidaeJdk.html

and the following script:

final Map<String,Counter<GenotypeType>> sample2count = new HashMap<>();
stream().flatMap(V->V.getGenotypes().stream()).forEach(G->{
    final String sn = G.getSampleName();
    Counter<GenotypeType> c= sample2count.get(sn);
    if(c==null) {
                c=new Counter<GenotypeType>();
                sample2count.put(sn,c);
                }
    c.incr(G.getType());
    });


out.print("#name");
for(GenotypeType gt:GenotypeType.values()) out.print("\t"+gt.name());
out.println();

for(final String sn: sample2count.keySet()) {
  out.print(sn);
  Counter<GenotypeType> c= sample2count.get(sn);
 for(GenotypeType gt:GenotypeType.values()) out.print("\t"+c.count(gt));
  out.println();
}

usage:

java -jar dist/bioalcidaejdk.jar -f  biostar.code src/test/resources/rotavirus_rf.vcf.gz

will print the types of genotypes for each sample:

#name  NO_CALL  HOM_REF  HET  HOM_VAR  UNAVAILABLE  MIXED
S3     0        30       7    8        0            0
S4     0        31       7    7        0            0
S5     0        37       0    8        0            0
S1     0        36       7    2        0            0
S2     0        30       7    8        0            0
ADD COMMENT

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6