Tool to check what alternate allele is dominant across samples per line of the VCF file.
2
0
Entering edit mode
3.0 years ago
halo22 ▴ 290

Hello All,

I am very new to WGS analysis. I have a multisample VCF file that I have annotated using snpEFF. I wanted to see if I can find what alternate alleles are conserved between samples for each genomic location. For eg: Chr1 pos: 1001, has a reference A and the alternate allele seen are T, AAT, TTT, AA and there are 10 samples. I want to count what samples have TTT, AA and so on. This way I can understand what allele is dominant across samples for each position.

All help is appreciated

Thanks.

next-gen WGS • 831 views
0
Entering edit mode
3.0 years ago
Carambakaracho ★ 3.1k

I made very good experiences with the bioconductor vcfR package.

In case you're new to R, too and this is a one-off project, there's nothing wrong to just use Excel and multiple text-to-column operations to split the data (provided your machine is powerful enough to handle it). It's a bit tedious, but the learning curve is less steep

0
Entering edit mode

there's nothing wrong to just use Excel

1
Entering edit mode

:-D damn it, I got the excel shame AND didn't realise the thread was 20 days old.

@halo22 you better not use my excel advise and try to hire with Pierre I guess.

@pierre or anyone as this is most likely irrelevant for the OP anyway, does biostars feature strikeout markdown?

Cheers

0
Entering edit mode

Thank you, guys! I wrote my own to get this done.

0
Entering edit mode
3.0 years ago

plink2 --vcf <VCF path> --freq counts

gets this information for you. (Remove 'counts' if you want proportions instead.)