Common pattern for several amino acids
1
0
Entering edit mode
9.3 years ago
tretyacv ▴ 40

Hello,

I am trying to generate a nucleotide motif that will code chosen amino acids. For example - histidine is coded by CAT, CAC. Arginine is CGT, CGC, CGA, CGG,AGA and AGG. The pattern is:

1. position in codon - C or A

2. position in codon - A or G

3. position - A, T, C or G

by that rule you can define chosen amino acids (H and R) but also the amino acids that i dont want (for example AAA is lysine, AAT is asparagine...). So I need to define the pattern that matches only my chosen AAs, in case above it can be: [C][A or G][T], that pattern defines only histidine and arginine, but not the other amino acids. I am trying to work out an algorithm which will do this thing with any amino acids which i choose (more than two) and if the pattern does not exist it should find the possibilities for less amino acids (for example if pattern for 5 amino acids does not exist, it will find the patterns for four amino acids from the query) - this final optimization problem is probably the hardest part. Any suggestions? Thanks a lot and sorry for my poor english.

python amino acid codon randomization library • 3.0k views
ADD COMMENT
0
Entering edit mode

Hello tretyacv!

It appears that your post has been cross-posted to another site: http://stackoverflow.com/questions/27603128

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Yes, that true, I'm sorry, I thought that stackoverflow is more about computer science and biostars is bioinformatics site, so the communities does not overlap.

ADD REPLY
0
Entering edit mode

You are correct in assuming that biostars specializes in bioinformatics, but people here also have a presence on stack overflow because geekiness has no boundaries :)

ADD REPLY
0
Entering edit mode
9.3 years ago
Ram 43k

I'd solve it this way:

H: CAT, CAC

R: CGT, CGC, CGA, AGA, AGG

H or R: (CAT|CAC|CGT|CGC|CGA|AGA|AGG) #matches to only the codons you need.

You can always create a dict of AA:List<codons> and use the pattern where the codons of all target AAs are joined with delimiter '|' and the entire set is bounded by '(' and ')'.

ADD COMMENT
0
Entering edit mode

But how do I generate the pattern? I need to create a library of DNA sequences where at, for example, third position will be this universal pattern for histidine or arginine, and i need to tell organic chemists what nucleotide they should add in synthesis next - first C, then the mix of A and G and at last only G. By this recipe can be yielded the DNA sequence with codons only for arginine or histidine. And of course I need to generalize this problem, not only for R and H.

ADD REPLY
0
Entering edit mode

That IMO is only possible if you have more rather than less common nucleotides in corresponding positions. The more diverse the codon set, the more false matches you will get. Looking at this nucleotide by nucleotide cannot yield a solution because the codon vocabulary is contextual to a length of 3.

Even if you manually created patterns for all combinations, you'd end up with false positives (unless you restrict yourself to a subset of the codons, assuming an overlapping subset (with >=1 common nucleotide) exists in the first place. For example, with R and H, if you chose CAT, CAC, CGT and CGC, you could possible use C[AG][TC] to be sure that you're going position by position and optimizing for only R and H.

I think this might be possible, but you'll invest more time creating the algorithm than saving time using it.

ADD REPLY

Login before adding your answer.

Traffic: 2588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6