Question: How to count the number of codons in a frame or order using the equation?
0
gravatar for projetoic
8 weeks ago by
projetoic0
projetoic0 wrote:

Let's consider these values

T/U=1, C=2, A=3, G=4

And that order: enter image description here

The input is a file.fasta that has the following data:

>id
atgatg

I will open and read the file and I will analyze it every three characters First analysis starts with atg ...

Then I apply it to an equation that will tell you where in the order previously established the result of the count will be:

((P1 - 1)*16) + P2 + ((P3-1)*4)

P1 = position 1 of atg Who is in position 1 of atg is A so P1 has a value of 3 (since A = 3, G - 4 ..)

Soon the equation becomes:

((3 - 1)*16) + P2 + ((P3-1)*4)

P2 = position 2 of atg which is equal to t T has a value of 1

Soon the equation becomes:

((3 - 1) * 16) + 1 + ((P3-1) * 4)

P3 = position 3 of atg which is equal to g

Soon the equation becomes:

((3 - 1) * 16) + 1 + ((4-1) * 4)

Soon the equation becomes:

((3 - 1) * 16) + 1 + ((4-1) * 4) = 45

So by counting we put 1 in position 45 which is AUG and I will write this data in a table

AUG  1

When we read the second atg it will be

AUG : 2

With this example the output would look like this:

ATG: 0 TAT: 0 TAA: 0 AUG : 2...

what I already managed to do was

with open('file.fasta') as fasta:
        conteudo = fasta.read()

coteudo = str.maketrans({'A':'1', 'G':'2', 'C':'3', 'U':'4'})

If you want to understand better you can access this link: http://codonw.sourceforge.net/DataRecoding.html

how can I solve this problem?

code biopython python • 222 views
ADD COMMENTlink modified 8 weeks ago by Jorge Amigo12k • written 8 weeks ago by projetoic0
1
gravatar for Devon Ryan
8 weeks ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

That equation is a custom hash function that someone came up with to ensure that codons hash to unique sequential values for memory efficiency. In a modern language like python there are built in hash functions, so one could instead just do:

someSequence = 'atgatg'
d = dict()
for idx in xrange(0, len(someSequence), 3):
    codon = someSequence[idx:idx+3].upper()
    if codon not in d:
        d[codon] = 0
    d[codon] += 1

for codon, cnt in d.items():
    print('{}: {}'.format(codon, cnt))

If for some reason you absolutely HAD to use this custom hashing function, then you have to use a vector of values:

valueConversion = {'T': 1, 'U': 1, 'C': 2, 'A': 3, 'G': 4}

def customHash(codon):
    P1 = valueConversion[codon[0].upper()]
    P2 = valueConversion[codon[1].upper()]
    P3 = valueConversion[codon[2].upper()]
    return ((P1 - 1)*16) + P2 + ((P3-1)*4) - 1  # Note the conversion to 0-based indexing!

someSequence = 'atgatg'
v = [0] * 64  # N.B., python uses 0-based indexing
for idx in xrange(0, len(someSequence), 3):
    codon = someSequence[idx:idx+3].upper()
    v[customHash] += 1

Now v contains the counts starting from AAA to GGG. There are probably some typos in there. Finding and correcting these errors can be an exercise for you.

ADD COMMENTlink written 8 weeks ago by Devon Ryan98k
#Script para contar códons

someSequence = 'atguuucccggggtataaggcaaaa'
d = dict()
for idx in range(0, len(someSequence), 3):
    codon = someSequence[idx:idx+3].upper()
    if codon not in d:
        d[codon] = 0
    d[codon] += 1

for codon, cnt in d.items():
    print('{}: {}'.format(codon, cnt))

# Script para ordenar os códons

valueConversion = {'A': 1, 'G': 2, 'C': 3, 'U': 4, 'T': 4}
def customHash(codon):
    P1 = valueConversion[codon[0].upper()]
    P2 = valueConversion[codon[1].upper()]
    P3 = valueConversion[codon[2].upper()]
    return  P3 + 4*(P2-1) + 16*(P1 - 1) # Note the conversion to 0-based indexing!

v = [0] * 64  # N.B., python uses 0-based indexing
for idx in range(0, len(someSequence), 3):
    codon = someSequence[idx:idx+3].upper()
    v[0]+= 1

ordenacao = customHash(someSequence)
print(ordenacao)
print(v)

Unfortunately, I am not getting the expected output ... I would like the output to appear to be the codon frequency in the order I determined with the new form, but I can only get something like [1, 0, 0, 0, 0, 0, 0, 0, 0,] and would like to get,['CODE' = FREQUENCY], for example ['AAA' = 1, 'AAG' = 2, 'AAC' = 1],

I have some strings that have been grouped with a linux cat. And I would like to assign this cat file to that code and thus generate several frequencies according to the sequences.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by projetoic0
2

Further help comes at a cost of 500 euro per hour (or fraction thereof). I imagine you'll want to code the last bit yourself...

ADD REPLYlink written 8 weeks ago by Devon Ryan98k

ok. Thanks! :)

ADD REPLYlink written 8 weeks ago by projetoic0

Did you check my answer? [sigh]

$ echo "atguuucccggggtataaggcaaaa" | perl -ne '$cs{$1}++ while /(...)/g;
> END { foreach $c (sort keys %cs) { $final .= "\"".uc($c)."\" = $cs{$c}, " }
> $final =~s/, $//; print "[$final]" }'
["AAA" = 1, "ATG" = 1, "CCC" = 1, "GGC" = 1, "GGG" = 1, "GTA" = 1, "TAA" = 1, "UUU" = 1]
ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by Jorge Amigo12k

It is:

syntax error at -e line 2, near ">"
syntax error at -e line 3, near ""[$final]" }"
Execution of -e aborted due to compilation errors.

What do these errors mean? I can't see the exit/output...

ADD REPLYlink written 7 weeks ago by projetoic0

What I wrote in my comment was exacly what you could read on your screen, so the $ and the > characters at the beggining of each line are not meant to be copied and pasted. If you want to do so, then you have to copy and paste this:

echo "atguuucccggggtataaggcaaaa" | perl -ne '$cs{$1}++ while /(...)/g; END {
foreach $c (sort keys %cs) { $final .= "\"".uc($c)."\" = $cs{$c}, " }
$final =~ s/, $//; print "[$final]" }'
ADD REPLYlink written 7 weeks ago by Jorge Amigo12k

Thanks! ;)

ADD REPLYlink written 7 weeks ago by projetoic0
valueConversion = {'T': 1, 'U': 1, 'C': 2, 'A': 3, 'G': 4}

def customHash(codon):
    P1 = valueConversion[codon[0].upper()]
    P2 = valueConversion[codon[1].upper()]
    P3 = valueConversion[codon[2].upper()]
    return ((P1 - 1)*16) + P2 + ((P3-1)*4) - 1  # Note the conversion to 0-based indexing!

someSequence = 'atgatgaaauuu'
d = dict()

v = [0] * 64  # N.B., python uses 0-based indexing
for idx in range(0, len(someSequence), 3):
    codon = someSequence[idx:idx+3].upper()
    v[0] += 1
    if codon not in d:
        d[codon] = 0
    d[codon] += 1

odernation = customHash(someSequence)

for codon, cnt in d.items():
    print('{}. {}: {}'.format(odernation,codon, cnt))

output:

44. ATG: 2
44. AAA: 1
44. UUU: 1

It's always coming out 44 and it's not in order

Can you give me a hint of what I'm doing wrong? Or what can I do or study to fix it? What should I learn?

ADD REPLYlink written 7 weeks ago by projetoic0
1
gravatar for Jorge Amigo
8 weeks ago by
Jorge Amigo12k
Santiago de Compostela, Spain
Jorge Amigo12k wrote:

As Devon says, you don't need that equation to count exons. If the fasta sequence is serialized, you can just use this perl oneliner:

echo "atgatggtagtacatcatcat" | perl -lne '$cs{$1}++ while /(...)/g;
END { foreach $c (sort keys %cs) { print uc($c).": $cs{$c}"  } }'
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Jorge Amigo12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1152 users visited in the last hour
_