separate multi-alleles from bi-alleles
1
0
Entering edit mode
3 months ago

Hi Dear all

I want to know why we separate multi-alleles Variant from bi-alleles Variant in our analyzes?

In other words, why do most researchers in their own analyzes separate the multi-alleles Variant from the bi-alleles Variant and keep most of the bi-alleles?

genetics population • 315 views
1
Entering edit mode
3 months ago

One answer is that managing multi-allelic calls is difficult from an analysis perspective. In an association test, like in GWAS, as each site (variant) is analysed independently, it makes no difference to split a multi-allelic call into 2 separate bi-allelic calls, and test each separately.

Also, in a germline sample, it makes little sense, biologically, that a multi-allelic call would even be present, unless our VCF contains data from more than 1 individual. On the other hand, in a cancer context, considering a bulk tumour biopsy sample, we would expect many multi-allelic calls.

You will undoubtedly find many more opinions online via a search.

Kevin

Edit: to add information based on chrchang's response, we can tolerate multi-allelic calls depending on how we code them. Consider this call:

Ref:               A
Var:               T,G
Var allele counts: 56,2


We can potentially sum up the total allele count for the variants and regard it as a bi-allelic site, meaning a total of 58, or split it into 2 calls for T (56) and G (2).

1
Entering edit mode

Note that splitting a multi-allelic call does change plink2 --glm, plink2 --pca, plink2 --hwe, and quite a few other results. All should be more accurate if the call is not split. (With that said, sometimes you have no choice: e.g. if downstream pipeline steps can't handle multiallelic --glm output, go ahead and split first.)

0
Entering edit mode

Thanks for the additional information, chrchang! I had figured that this was a question with no single answer