8.5 years ago by
"...an orientation for ALL genes..." I think you should ask yourself, how do we know the orientation of ANY gene? DK is on the right track by saying that it depends on how the gene structure was inferred. If someone hands you a piece of DNA and says this is a gene sequence - but the strand is unknown, what would you do? (This happened to me recently - but it was with 19,000 sequences). There are a variety of things you can try to deduce orientation. You can try homology to known genes, but then you have to decide cutoffs. You can try translating the sequence from each strand and looking for stop codons, or an initiating methionine, this will help you partition the most likely coding strand for many, but not all. You can examine the codon bias of each strand. You can examine biases in the base position of codons for each strand (a Polish group applied this technique to yeast in the late 90's trying to decipher which of the 6000 or so predicted genes in the newly sequenced yeast genome were real). There are a variety of informatic consistency checks you can use to infer strand (orientation). You could (and should) even take it one step further and ask, how do we know any given gene really is a gene?
I think you won't really know the orientation of genes in a novel organism until you have several layers of evidence gathered from empirical observations (experiments). Strand specific sequencing protocols (as mentioned by DK) so that for any given RNA transcript produced from a locus you can determine which strand it came from. Orientation specific chromatin signatures (H3K4me3 at the 5' end of a gene). Perhaps even proteomic data telling you what peptides are produced by a given locus - and thus what orientation produced the protein (if it's coding). To answer your question, to be sure of the orientation of a few, you need evidence. You can infer something about ALL from a few, but certainty for any has to be built on evidence.