Quantify number of a specific codon in a group of genes
1
0
Entering edit mode
6.5 years ago

Dear all

I have a fasta file with a specific genes of my interest and I want to know how many of them contain the codon TTA.

Any idea to solve this little issue?.

Thanks.

gene • 1.4k views
ADD COMMENT
0
Entering edit mode

not enough information: what are those fasta sequences ? cdna ? genomic ? mrna ? strand + only ?

ADD REPLY
0
Entering edit mode

The fasta sequences are cdna

ADD REPLY
0
Entering edit mode

Can you show an example? Do they start with the start codon? Is there UTR which we have to take into account?

ADD REPLY
0
Entering edit mode

-For instance, this sequence has two TTA codon

>EFG05278 cdna chromosome:Strep_clav_ATCC27064:chromosome:295573:297627:1 gene:SCLAV_0202 gene_biotype:protein_coding transcript_biotype:protein_coding gene_symbol:plcA description:Putative non-hemolytic phospholipase C
ATGGCTGATGTCAACCGCCGCCGGTTCCTCCAGATCGCGGGTGCGACCGCGGGCCACGCG
GCGCTCTCCAGCAGCGTCGAACGCGCCGCGGCCCTCCCGGCGAACCGCCGGCACGGCACC
ATCGAGGACGTCGAGCACATCGTCGTCCTGATGCAGGAGAACCGCTCCTTCGACCA***TTA***T
TTCGGGGCACTCCGGGGCGTACGGGGCTTCGGTGATCCCCGGCCGTACATCCTGGACTCC
GGCATGTCCGTCTGGCACCAGTCGGACGGCGCGCGGGAGGTGCTGCCGTACCGTCCGGAC
CTCGACGACCTCGGGATGCAGTTCCTCGCCGGTCTCCGCCA***TTA***CTGGTCCGACGGCCAC
GCGGCCTGGAACAACGGGAAGTACGACCGCTGGCTCCCGGCGAAGTCGGCGGGGACGATG
GCCCATCTGACCCGCGACGACATCCCGTTCCACTACGCCCTCGCGGACGCGTTCACCGTG
TGCGACGCGTACCACTGCTCGTTCATCGGCGCGACCGACCCCAACCGCTACTACATGTGG
AC

But I have around 7000 sequences, some does not have the codon and many have more than one time the codon. I need to know if the sequence has the codon; it does not matter if is repeated in the sequence.

-All the sequences start with the start codon

-Very good question about the UTR. But, I am only interested in the condons in the ORF.

ADD REPLY
0
Entering edit mode

What have you tried? We love it when people show some effort and we can solve their problems, rather than getting open questions.

You should also specify if TTA has to be inframe.

ADD REPLY
0
Entering edit mode

Because I do not have enough skills in programming to solve this problem, I tried searching some tools in omictools and some similar post here in Biostar, but I did not find any.

TTA has to be inframe.

ADD REPLY
0
Entering edit mode
6.5 years ago

I wrote the following naive solution, which checks for inframe TTA codons and writes the fasta record to a new file if it contains TTA. Is that okay? Or should it do something else with those records?

Save as codon-filter.py (or any other name you like) and execute as:

python codon-filter.py your_sequences.fasta > sequences_with_TTA.fasta

I don't really have appropriate testing data so this code is untested.

Oh yeah, this is python3.

ADD COMMENT

Login before adding your answer.

Traffic: 1763 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6