Entering edit mode
14 days ago
Mohadese
•
0
I have a CSV file like the following and I would like to convert it to VCF. How Can I do it using Python?
Sample Name,rsID,Chr,Position,Allele1 - Plus,Allele2 - Plus,genotype
2,1:103380393,1,103380393,G,G,GG
2,1:106737318,1,106737318,T,T,TT
2,1:109439680,1,109439680,A,A,AA
2,1:110228436_CNV_GSTM1,1,110228436,-,-,--
2,1:110228505_CNV_GSTM1,1,110228505,C,C,CC
2,1:110228615_CNV_GSTM1,1,110228615,T,T,TT
2,1:110228695_CNV_GSTM1,1,110228695,G,G,GG
2,1:110229315_CNV_GSTM1,1,110229315,G,G,GG
You do not have enough columns to transform your
CSV
intoVCF
ver 4.2. There are justCHROM POS ID REF ALT
end even that is being generous. In your example the species (human) and the genome assembly version are not specified.Alelle1
being the same asAlelle2
makes no sense if one thinks about Alelle1 asREF
. So whatever you do downstream you need to obtain theREF
from the proper genome assembly.I also have a file containing following information
This looks like some result from Infinium Global Diversity Array. The best is to contact Illumina customer support. Check if GenomeStudio can output a meaningful,well formatted VCF.