I am new to bioinformatics. I am learning about SNPs and trying to use vcftool. In vcftool if i use the --012 option it generates a .012 file with 0,1,2 in it. Where "Genotypes are represented as 0, 1 and 2, where the number represent that number of non-reference alleles."(Ref: Vcftool manual). So If for a snp position the reference allele is C the possible combinations are CC,CT,TC,TT where CC=0, CT/TC=1, TT=2 are the encodings.
My first question is what does this CC mean ?? Does this mean at the particular position there is neocleotide C and it is paring with another neocleotide C on the other strand?
If that so it is not following the base-pair rule . Is it ?
Finally the generated files have very large number of 0's and 2's than that of number of 1's. Should not CT/TC be the common combination that CC and TT.
If I am understanding completely wrong some helpful links will be appreciated.
Thanks In Advance.