Dear experts, I have a genotype data which I want to use for GWAS. The genotype data contains all columns, except allele columns i.e. Ref and Alt alleles. It has all other information, such as chromosome position, chromosome number, and the alleles in my sample etc. It has already been aligned to the reference genome, but I am confused about the Ref and Alt alleles. Is there any way to get it? any software which can extract reference and alternative allele? It is not in any format. Its just a text file. I need to find the alleles for association.
One option would be to use the Ensembl REST API with the region/overlap endpoint to fetch the alleles. Here's an example with wheat: http://rest.ensembl.org/overlap/region/triticum_aestivum/4A:714193714-714193714?content-type=application/json;feature=variation
You would need to use your favourite programming language to run through your list, run the script and fill in the gaps. I suspect this would take a long time to run.
An alternative with the Ensembl REST API would be to use the sequence/region endpoint to get the reference allele at this locus, and, as @i.sudbury suggests, use this to infer the alt. The benefit of this over the overlap endpoint is that it's also available as a POST endpoint allowing you to query in batches of 50, which would be quicker.