Question: How to count the number of codons in a frame or order using the equation?
0
8 weeks ago by
projetoic0 wrote:

Let's consider these values

``````T/U=1, C=2, A=3, G=4
``````

And that order: enter image description here

The input is a file.fasta that has the following data:

``````>id
atgatg
``````

I will open and read the file and I will analyze it every three characters First analysis starts with atg ...

Then I apply it to an equation that will tell you where in the order previously established the result of the count will be:

``````((P1 - 1)*16) + P2 + ((P3-1)*4)
``````

P1 = position 1 of atg Who is in position 1 of atg is A so P1 has a value of 3 (since A = 3, G - 4 ..)

Soon the equation becomes:

``````((3 - 1)*16) + P2 + ((P3-1)*4)
``````

P2 = position 2 of atg which is equal to t T has a value of 1

Soon the equation becomes:

``````((3 - 1) * 16) + 1 + ((P3-1) * 4)
``````

P3 = position 3 of atg which is equal to g

Soon the equation becomes:

``````((3 - 1) * 16) + 1 + ((4-1) * 4)
``````

Soon the equation becomes:

``````((3 - 1) * 16) + 1 + ((4-1) * 4) = 45
``````

So by counting we put 1 in position 45 which is AUG and I will write this data in a table

``````AUG  1
``````

When we read the second atg it will be

``````AUG : 2
``````

With this example the output would look like this:

``````ATG: 0 TAT: 0 TAA: 0 AUG : 2...
``````

what I already managed to do was

``````with open('file.fasta') as fasta:

coteudo = str.maketrans({'A':'1', 'G':'2', 'C':'3', 'U':'4'})
``````

If you want to understand better you can access this link: http://codonw.sourceforge.net/DataRecoding.html

how can I solve this problem?

code biopython python • 222 views
modified 8 weeks ago by Jorge Amigo12k • written 8 weeks ago by projetoic0
1
8 weeks ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

That equation is a custom hash function that someone came up with to ensure that codons hash to unique sequential values for memory efficiency. In a modern language like python there are built in hash functions, so one could instead just do:

``````someSequence = 'atgatg'
d = dict()
for idx in xrange(0, len(someSequence), 3):
codon = someSequence[idx:idx+3].upper()
if codon not in d:
d[codon] = 0
d[codon] += 1

for codon, cnt in d.items():
print('{}: {}'.format(codon, cnt))
``````

If for some reason you absolutely HAD to use this custom hashing function, then you have to use a vector of values:

``````valueConversion = {'T': 1, 'U': 1, 'C': 2, 'A': 3, 'G': 4}

def customHash(codon):
P1 = valueConversion[codon[0].upper()]
P2 = valueConversion[codon[1].upper()]
P3 = valueConversion[codon[2].upper()]
return ((P1 - 1)*16) + P2 + ((P3-1)*4) - 1  # Note the conversion to 0-based indexing!

someSequence = 'atgatg'
v = [0] * 64  # N.B., python uses 0-based indexing
for idx in xrange(0, len(someSequence), 3):
codon = someSequence[idx:idx+3].upper()
v[customHash] += 1
``````

Now `v` contains the counts starting from `AAA` to `GGG`. There are probably some typos in there. Finding and correcting these errors can be an exercise for you.

``````#Script para contar códons

someSequence = 'atguuucccggggtataaggcaaaa'
d = dict()
for idx in range(0, len(someSequence), 3):
codon = someSequence[idx:idx+3].upper()
if codon not in d:
d[codon] = 0
d[codon] += 1

for codon, cnt in d.items():
print('{}: {}'.format(codon, cnt))

# Script para ordenar os códons

valueConversion = {'A': 1, 'G': 2, 'C': 3, 'U': 4, 'T': 4}
def customHash(codon):
P1 = valueConversion[codon[0].upper()]
P2 = valueConversion[codon[1].upper()]
P3 = valueConversion[codon[2].upper()]
return  P3 + 4*(P2-1) + 16*(P1 - 1) # Note the conversion to 0-based indexing!

v = [0] * 64  # N.B., python uses 0-based indexing
for idx in range(0, len(someSequence), 3):
codon = someSequence[idx:idx+3].upper()
v[0]+= 1

ordenacao = customHash(someSequence)
print(ordenacao)
print(v)
``````

Unfortunately, I am not getting the expected output ... I would like the output to appear to be the codon frequency in the order I determined with the new form, but I can only get something like [1, 0, 0, 0, 0, 0, 0, 0, 0,] and would like to get,['CODE' = FREQUENCY], for example ['AAA' = 1, 'AAG' = 2, 'AAC' = 1],

I have some strings that have been grouped with a linux cat. And I would like to assign this cat file to that code and thus generate several frequencies according to the sequences.

2

Further help comes at a cost of 500 euro per hour (or fraction thereof). I imagine you'll want to code the last bit yourself...

ok. Thanks! :)

Did you check my answer? [sigh]

``````\$ echo "atguuucccggggtataaggcaaaa" | perl -ne '\$cs{\$1}++ while /(...)/g;
> END { foreach \$c (sort keys %cs) { \$final .= "\"".uc(\$c)."\" = \$cs{\$c}, " }
> \$final =~s/, \$//; print "[\$final]" }'
["AAA" = 1, "ATG" = 1, "CCC" = 1, "GGC" = 1, "GGG" = 1, "GTA" = 1, "TAA" = 1, "UUU" = 1]
``````

It is:

``````syntax error at -e line 2, near ">"
syntax error at -e line 3, near ""[\$final]" }"
Execution of -e aborted due to compilation errors.
``````

What do these errors mean? I can't see the exit/output...

What I wrote in my comment was exacly what you could read on your screen, so the `\$` and the `>` characters at the beggining of each line are not meant to be copied and pasted. If you want to do so, then you have to copy and paste this:

``````echo "atguuucccggggtataaggcaaaa" | perl -ne '\$cs{\$1}++ while /(...)/g; END {
foreach \$c (sort keys %cs) { \$final .= "\"".uc(\$c)."\" = \$cs{\$c}, " }
\$final =~ s/, \$//; print "[\$final]" }'
``````

Thanks! ;)

``````valueConversion = {'T': 1, 'U': 1, 'C': 2, 'A': 3, 'G': 4}

def customHash(codon):
P1 = valueConversion[codon[0].upper()]
P2 = valueConversion[codon[1].upper()]
P3 = valueConversion[codon[2].upper()]
return ((P1 - 1)*16) + P2 + ((P3-1)*4) - 1  # Note the conversion to 0-based indexing!

someSequence = 'atgatgaaauuu'
d = dict()

v = [0] * 64  # N.B., python uses 0-based indexing
for idx in range(0, len(someSequence), 3):
codon = someSequence[idx:idx+3].upper()
v[0] += 1
if codon not in d:
d[codon] = 0
d[codon] += 1

odernation = customHash(someSequence)

for codon, cnt in d.items():
print('{}. {}: {}'.format(odernation,codon, cnt))
``````

output:

``````44. ATG: 2
44. AAA: 1
44. UUU: 1
``````

It's always coming out 44 and it's not in order

Can you give me a hint of what I'm doing wrong? Or what can I do or study to fix it? What should I learn?

1
8 weeks ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

As Devon says, you don't need that equation to count exons. If the fasta sequence is serialized, you can just use this perl oneliner:

``````echo "atgatggtagtacatcatcat" | perl -lne '\$cs{\$1}++ while /(...)/g;
END { foreach \$c (sort keys %cs) { print uc(\$c).": \$cs{\$c}"  } }'
``````