Entering edit mode
                    13 months ago
        Mohadese
        
    
        •
    
    0
    I have a CSV file like the following and I would like to convert it to VCF. How Can I do it using Python?
Sample Name,rsID,Chr,Position,Allele1 - Plus,Allele2 - Plus,genotype
2,1:103380393,1,103380393,G,G,GG
2,1:106737318,1,106737318,T,T,TT
2,1:109439680,1,109439680,A,A,AA
2,1:110228436_CNV_GSTM1,1,110228436,-,-,--
2,1:110228505_CNV_GSTM1,1,110228505,C,C,CC
2,1:110228615_CNV_GSTM1,1,110228615,T,T,TT
2,1:110228695_CNV_GSTM1,1,110228695,G,G,GG
2,1:110229315_CNV_GSTM1,1,110229315,G,G,GG
You do not have enough columns to transform your
CSVintoVCFver 4.2. There are justCHROM POS ID REF ALTend even that is being generous. In your example the species (human) and the genome assembly version are not specified.Alelle1being the same asAlelle2makes no sense if one thinks about Alelle1 asREF. So whatever you do downstream you need to obtain theREFfrom the proper genome assembly.I also have a file containing following information
This looks like some result from Infinium Global Diversity Array. The best is to contact Illumina customer support. Check if GenomeStudio can output a meaningful,well formatted VCF.