I am having trouble reading some dbSNP records and I thought i should check my understanding of a few things first.
I thought positive and negative strand descriptions were always meant with regard to a reference genome (where the positive strand is the strand running in the 5' to 3' direction which has shortest distance to centromere as described here). Is it correct that the postive and negative strand conventions always refer to a reference genome?
I believe different genome versions by different sequencing consortiums can have different alleles for a locus. I believe the reference allele (in a heterozygote) is the allele that appears most in the clones sequenced. This is just arbitary. It is simply the allele that has the highest coverage in the clones sequenced. It is not necessarily the major allele of a snp. If two consortiums sequence the same clones they should get the same reference allele as each base has the same coverage but if a sequencing group sequences a different set of clones they could get a different reference allele at for a base position. Is this correct?
thanks a lot