I just started my new job as a data programmer based in Seattle WA with the epidemiology group. I am primarily a SAS programmer. I have no experience when dealing with genomic data. Just a month into my job assignment, I have been asked to convert a text genomic file to VCF. ? There were many ideas bounced around by my coworkers and I have been asked to explore using R-bioinformatics, Plink, maybe other C++ libraries. I am not versed in any of these languages :( I googled many sites and came across a lot of information on how to read VCF but I haven't seen any information about creating the VCF files. Our genomic data are huge. I would like to convert our internal genomic text file using SAS and write the contents of the file in the format specified by VCF using this order: chrom, pos, id, ref, alt, qual, filter, info, format. However the data I received has the following columns: sample id, snp name, chr, position, allele1-top, allele2-top, x, y, r and b allele freq. The problem is I was able to map the one that are pretty straight-forward but I have no idea where the allele1-top, allele2-top, x, y, r, b allele freq map to. I am not a genomic expert and I am thinking of proposing to my supervisor that someone who has genomic experience should help me to do the mapping.
Another alternative might be to determine if it is even feasible to generate the VCF from our internal data.
Do you have any idea how I might proceed?