Tool to check what alternate allele is dominant across samples per line of the VCF file.
3.0 years ago
halo22 ▴ 290

Hello All,

I am very new to WGS analysis. I have a multisample VCF file that I have annotated using snpEFF. I wanted to see if I can find what alternate alleles are conserved between samples for each genomic location. For eg: Chr1 pos: 1001, has a reference A and the alternate allele seen are T, AAT, TTT, AA and there are 10 samples. I want to count what samples have TTT, AA and so on. This way I can understand what allele is dominant across samples for each position.

All help is appreciated

Thanks.

3.0 years ago
Carambakaracho ★ 3.1k

I made very good experiences with the bioconductor vcfR package.

In case you're new to R, too and this is a one-off project, there's nothing wrong to just use Excel and multiple text-to-column operations to split the data (provided your machine is powerful enough to handle it). It's a bit tedious, but the learning curve is less steep

Thank you, guys! I wrote my own to get this done.

3.0 years ago

plink2 --vcf <VCF path> --freq counts

gets this information for you. (Remove 'counts' if you want proportions instead.)