Is there a convention or nomenclature for the order of naming base pairs when describing nucleic acid structures and geometry in a publication?
3 months ago
Hi, as per the title, is there a convention for the order of naming base pairs in a publication? Is there even a convention or rules? Naming bases from 5' to 3' appears sensible to me, yet from the paper I'm referencing "a Watson-Crick G15•C10 base pair, a base triple of [A23•G8] •A11") wouldn't C10•G15 and [G8•A23]•A11 make more sense? Or is there some overriding rule?

 For context, this is regarding the description of nucleic acid aptamer structures. The paper can be found here for anyone interested.

I believe what is being referenced as 'Watson-Crick' in that quote is short for 'Watson-Crick geometry'. See here or here, etc. for examples. There is often no overriding rule or convention for such thing. You'd have to provide more context for those more familiar to better judge. It may simply be that G15 was known for ages to be important relative something else and when the secondary or 3D structure was determined, it became apparent it was base-paired to C10 in a stem. Rather than bury the reference to the position that a lot of others in the field are familiar with, it is placed first to be prominent. Or maybe G15 is one molecule, i.e., the one more featured in the current study/studied more in the field, and the other C10 is in another? Without more context that is all speculation.

Thanks for your reply, yes they are referring to W-C geometry, I realize now I should have added more context to this post.

I would agree with your thinking on how the order chosen here may have come about.

3 months ago

nobody should, under any circumstance call strands Watson/Crick.

it is beyond cringy - and for that reason anyone who would call a strand Watson cannot be trusted in any other naming scheme,

not to mention I don't even understand what does [A23•G8] •A11 even mean?

When it comes to variant calling there is an entire nomenclature on how to do it: HGVS

https://varnomen.hgvs.org/

you could use that nomenclature if you wish,

in my opinion sticking to a commonly used format like BED or GFF while mentioning the start/end basepairs would make most sense

This is referring to descriptions of nucleic acid aptamer structures, so it needs to distinguish between Watson-Crick and Hoogsteen pairing. Only short lengths in isolation not in the context of a genome.

The [A23•G8]•A11 is a base triple where three bases are coordinated on each other through Hoogsteen mismatch pairing. Given additional context in the paper, my assumption is the bases denoted in the square brackets are a more primary coordination in that they share more hydrogen bonding with one another than with the base outside of the square brackets.  Rereading the paper, the order of [A23*G8*]*A11 is because G8 bridges between the A23 and A11 in the [A23*G8*]*A11 base triple. The authors also refer to this triple as A11*[G8*A23] when the interactions of A11 are the focus, so possibly order primacy is determined by which nucleotide is the primary subject in each case.

Since posting I have read wikipedia.org/wiki/Nucleic_acid_nomenclature which states that '*' should be used to denote Hoogsteen pairing, though there isn't a reference for it. However with this nomenclature [A23*G8]*A11 would be more correct but it still doesn't answer my question regarding the order of base numbering.

I will look into those formats, cheers.