Hi,
I am using VcfTools to parser VCF files.
I can use it to generate a 012 matrix. This matrix is 2D, with the shape of (num of individuals, num of SNPs).
In each cell in the matrix, there is the number of occurrences, of the alternative allele for the specific SNP in a specific individual.
This is great for a biallelic data - means for every SNP there is a single allele.
In my case there are at most n alleles per SNP,and I would like to have n matrices, each one is for an "allele index" and it specify in each cell how many occurrences of that allele are, in the specific Individual in the specific SNP.
Does anyone familiar with a tool that can provide that?
Thanks
I want to ask: what mean 1? 0 and 2 in the last df ? also gt how changed after split alt for example 1/2 how will be changed after split alt?
The numbers in the last
dfrepresent allele count. For instance, since theAsample is heterozygous forchr1-100-G-Athe sample has1and so on.I'm not sure if I understand your second question. If you are asking about multiallelic loci, they will be split into multiple rows. For instance, the position
chr1-101has two alternative alleles but in the lastdfthey were split into three rows (two alt + one ref).the last df represent allele count for reference? the second question yes I know that split the multiallelic but 1/2 is changed to what? 0/1 then it's count the allel? .. Also, I want to ask what is mean the GG or TT in last df? also why not split only multiallelic and keep bi allel as it self?
No, they represent the counts for both reference and alternative alleles. Continuing to use the
Asample as an example, it is heterozygous forchr1-100-G-A. Therefore, it has an count of 1 forchr1-100-G-A(alt) and 1 forchr1-100-G-G(ref).Yes.
That depends on what your end goal is. When I first developed this function, I wanted to quickly get allele counts for both alt and ref alleles. You can easily modify the above code to extract allele counts only for alt if that's what you want.
Also , I want to ask the fuc package work on windows and python? I can't download it.
It should work in Windows too. It's just a Python library. If you can use popular libraries like
pandaandnumpy, you should be able to usefucas well. I recommend usingcondato installfuc.I'm using coda but the problem I think that need (pysam). And I can't download it until now.
Try this: