Hello experts, I have been working with high-throughput genome sequencing data for more than 2 years. I have a confusion about the REF and ALT alleles in a vcf file. How REF and ALT alleles are determined in a vcf file? I searched a lot on internet but did not find any relevant information about it. I did GWAS and identified a SNP associated with my desired phenotype. When I annotated that SNP with a SNP annotation tool, it shows G>A conversion at a specific position, which is associated with the trait I studied. To further confirm the resuts, when I separate the GWAS population into two on the basis of G and A SNP, it shows C SNP in one population and T SNP in another population, instead of G and A. I guess, the SNP lies on the opposite strand of DNA, as C is complementary to G, and A is complementary to T. Please give your valuable comments. Following are the columns of vcf file with REF and ALT snp, and it annotation information.
#CHROM POS ID REF ALT QUAL FILTER
9 689616 rs1023 C T . . upstream_gene_variant c.-1510G>A n.689616C>T
Thank you very much for your response. I know that REF is the the base in reference genome and ALT is the SNP at that position in my population. But in my data the REF and ALT bases are different from the SNPs at that position in the population. Its G>A change in the population at that position instead of C or T, Here are some nucleotides from that position in the reference genome, with G nucleotide in bold text, which is changed to A in the GWAS population GTCAAGTAGTTCGGTGAAGGGGGAT. But when I looked into the VCF file its shows REF allele C and ALT allele T. Should it not be G as REF allele, and A as ALT. Why is here C in REF, and T in ALT. I have checked other positions as well, and found most of the REF and ALT alleles are determined like this.