12 substitution types versus 6 with transcribed/untrarnscribed
1
0
Entering edit mode
9.9 years ago

This is a very basic question, but I haven't found an answer. I have two groups of sequencing data. One gives substitutions from 12 possible (4 choose 2). The other has only 6 possible substitutions, always for the pyrimidine reference, plus the binary "transcribed" versus "untranscribed" strand. I want to compare the two systems, including neighboring bases. My first thought is to call the pyrimidines of the 12 "transcribed" and take the reverse complement of the purines, calling those "untranscribed". Does that correspond to what I'm seeing in the different data sets?

Many thanks,

Phil

sequencing • 1.4k views
ADD COMMENT
0
Entering edit mode

It's completely unclear (to me at least) what you're trying to do and what you want. For example, "One gives substitutions from 12 possible," isn't even a complete sentence, so I haven't a clue as to its meaning. Please expand on what you're trying to do, by giving background context and perhaps some example data so we can get a better idea what you actually need.

ADD REPLY
0
Entering edit mode

I have data sets that use two different formats to describe single nucleotide substitutions. One format uses all 4 reference bases with each of 3 possible substitutions to give 12 possibilities in total: A>C, A>G, A>T, C>A, ... The other format uses only the pyrimidine reference, so 6 substitutions, but for each indicates whether the substitution is on the transcribed or untranscribed strand. So, for example, the former has both A>C and T>G substitutions while the latter has T>G transcribed and T>G untranscribed but no A>C substitutions. I'm also looking at the bases on either side of the substitution. I think that I should take the pyrimidine/purine complement of those as well when the purine is the reference base in the first format and then reverse the order. Eg. A,A>C,G in the first format would become C (complement of G),T>G,T(complement of A).

ADD REPLY
0
Entering edit mode

Ah, that's one of the weirder formats I've ever seen. In that case, yes, it makes sense to just take the complement. Whether to take the reverse complement or not would depend on more information that's still unclear from your post. I'm guessing from context that in the case of purine substitutions on the coding strand that the sequence reported is always reverse complemented, in which case what you suggest is indeed correct.

ADD REPLY
0
Entering edit mode
9.9 years ago

Thanks. I'll try it that way and see if the results look sensible. If not, I'll try it without the reversal. Doing it the wrong way should give obviously nonsensical results.

ADD COMMENT

Login before adding your answer.

Traffic: 2926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6