Question: Different Assemblies, Reference Alleles And Forward And Negative Strands
2
gravatar for Andrea_Bio
8.2 years ago by
Andrea_Bio2.5k
Andrea_Bio2.5k wrote:

Hello

I am having trouble reading some dbSNP records and I thought i should check my understanding of a few things first.

I thought positive and negative strand descriptions were always meant with regard to a reference genome (where the positive strand is the strand running in the 5' to 3' direction which has shortest distance to centromere as described here). Is it correct that the postive and negative strand conventions always refer to a reference genome?

I believe different genome versions by different sequencing consortiums can have different alleles for a locus. I believe the reference allele (in a heterozygote) is the allele that appears most in the clones sequenced. This is just arbitary. It is simply the allele that has the highest coverage in the clones sequenced. It is not necessarily the major allele of a snp. If two consortiums sequence the same clones they should get the same reference allele as each base has the same coverage but if a sequencing group sequences a different set of clones they could get a different reference allele at for a base position. Is this correct?

thanks a lot

assembly reference strand • 3.9k views
ADD COMMENTlink modified 8.1 years ago by Jorge Amigo11k • written 8.2 years ago by Andrea_Bio2.5k
1
gravatar for Jorge Amigo
8.2 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

the sign of the strand has to be with the reading direction of the DNA sequence. it is a basic molecular genetics convention that just has to do with the synthesis of the RNA, which proceeds in the 5' → 3' direction. this convention states that a DNA sequence read in 5' → 3' direction is referred as "forward" or "+", and a DNA sequence read in 3' → 5' direction is referred as "reverse" or "-".

the reference allele is another convention used to normalize SNP calls. it points to the allele on the reference contig from the genome of reference used. although the reference contig may be in the forward strand a SNP may be defined in the reverse strand (or viceversa), and for that reason the reference allele represents the complementary base for the reported SNP allele. we do in fact use our own concept of "oriented reference allele", using the base from the sequence of reference in the same strand as the SNP's.

the reason why a reference contig was read in forward or in reverse may well be what you suggest, but the fact is that for us as end users this reason is not relevant, but we do have to pay attention to the strand of the reference allele if we want to use it as our reference. to read a little bit about reference alleles there is a note on ancestral alleles from the NCBI's SNP FAQ Archive that may be clarifying.

ADD COMMENTlink modified 8.1 years ago • written 8.2 years ago by Jorge Amigo11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2491 users visited in the last hour