Entering edit mode
17 months ago
أرْوَى
•
0
Hi
I'm new in the field
I have a large vcf file that have many variants with many samples. I extract gt for each sample / variants to get a matrix to do a machine learning algorithm. Now I need to encode this gt to do a machine learning.
I see a stranger numbers in gt like [-1 -1] , [1 5] , [ 1 6 ] , [ 0 -1] ,[ 1 -1],[ 2 -1]
So, what is mean for hg38? and How can I encode to use a machine learning?
where do you find this information in the VCF ? how do you extract the matrix ?
I'm extracting gt using sckite allel package ( python ). when I compare what I actually extract and origenal vcf I found ( -1 = . ).