Different Assemblies, Reference Alleles And Forward And Negative Strands
1
2
Entering edit mode
13.4 years ago
Andrea_Bio ★ 2.8k

Hello

I am having trouble reading some dbSNP records and I thought i should check my understanding of a few things first.

I thought positive and negative strand descriptions were always meant with regard to a reference genome (where the positive strand is the strand running in the 5' to 3' direction which has shortest distance to centromere as described here). Is it correct that the postive and negative strand conventions always refer to a reference genome?

I believe different genome versions by different sequencing consortiums can have different alleles for a locus. I believe the reference allele (in a heterozygote) is the allele that appears most in the clones sequenced. This is just arbitary. It is simply the allele that has the highest coverage in the clones sequenced. It is not necessarily the major allele of a snp. If two consortiums sequence the same clones they should get the same reference allele as each base has the same coverage but if a sequencing group sequences a different set of clones they could get a different reference allele at for a base position. Is this correct?

thanks a lot

assembly strand reference • 5.3k views
ADD COMMENT
2
Entering edit mode
13.3 years ago

the sign of the strand has to be with the reading direction of the DNA sequence. it is a basic molecular genetics convention that just has to do with the synthesis of the RNA, which proceeds in the 5' → 3' direction. this convention states that a DNA sequence read in 5' → 3' direction is referred as "forward" or "+", and a DNA sequence read in 3' → 5' direction is referred as "reverse" or "-".

the reference allele is another convention used to normalize SNP calls. it points to the allele on the reference contig from the genome of reference used. although the reference contig may be in the forward strand a SNP may be defined in the reverse strand (or viceversa), and for that reason the reference allele represents the complementary base for the reported SNP allele. we do in fact use our own concept of "oriented reference allele", using the base from the sequence of reference in the same strand as the SNP's.

the reason why a reference contig was read in forward or in reverse may well be what you suggest, but the fact is that for us as end users this reason is not relevant, but we do have to pay attention to the strand of the reference allele if we want to use it as our reference. to read a little bit about reference alleles there is a note on ancestral alleles from the NCBI's SNP FAQ Archive that may be clarifying.

ADD COMMENT

Login before adding your answer.

Traffic: 2704 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6