I am trying to get hold of drosophila variant information for the purpose of testing a new method on. However, I am having some issues. The main dataset I am looking at is from popfly (http://popfly.uab.cat/). However when I download the data and calculate frequencies, every variant seems to have a considerable number of N alt alleles. As I will need to annotate these variants this could prove a big issue, and there are too many with N alt alleles to simply remove from the dataset. What is the usual procedure in this instance?
I have also looked at other datasets such as from flybase, but there doesn't seem to be any variant data or vcf files. There are gff files and full sequence information, but not the variants that I need. Is there anything anyone can suggest here?