Question: How to find start and stop codon for sequences in a fasta file?
0
gravatar for grayapply2009
4.4 years ago by
grayapply2009170
United States
grayapply2009170 wrote:

I did blastn and blastx for my sequences (~400,000 sequences). How do I find and label the start and stop condon for each sequence in a fasta file?

next-gen • 3.8k views
ADD COMMENTlink modified 4.4 years ago by Kamil1.9k • written 4.4 years ago by grayapply2009170
1
gravatar for Kamil
4.4 years ago by
Kamil1.9k
Boston
Kamil1.9k wrote:

I suggest that you read about the genetic code to find the codons relevant to your organism.

You'll want to search for codons, perhaps with a tool like fasgrep. You might write your own script if you have a particular output format in mind.

On second glance, it seems that fasgrep is only useful for searching for sequence identifiers, not the sequences themselves.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Kamil1.9k

Thank you for your information, Kamil. So this fastgrep works like ExPASy? It picks the longest possible translated sequence?

ADD REPLYlink written 4.4 years ago by grayapply2009170
1

fasgrep is like grep. It searches for a string in a body of text. In your question, you ask about finding codons. I'd recommend using a search tool like grep to find codons.

If you have a different goal, you should edit your question. For example, if you wish to find possible coding sequences within a nucleotide sequence, you might consider other tools designed for this purpose:

As you mentioned, ExPASy is a nice portal to find other tools that might meet your needs.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Kamil1.9k

Yeah, I want to identify start and stop codon for each sequence but how do I know the codons grepped by fastgrep are correct for the coding sequence? I mean there are multiple "ATG"s or "TAG"s. Does this program take frame shift into consideration?

Besides, how do I label those codons when I grep them in a fasta file?
 

ADD REPLYlink written 4.4 years ago by grayapply2009170

If existing programs do not meet your needs, then you should write your own scripts to achieve your goals. If you're familiar with Python, this looks like a good starting point: Identifying open reading frames

Consider providing an example of your input and an example of your desired output. That might increase the clarity of your question.

ADD REPLYlink written 4.4 years ago by Kamil1.9k

Great! I'll take a look at the code. Thank you, Kamil!

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by grayapply2009170
0
gravatar for Antonio R. Franco
4.4 years ago by
Spain. Universidad de Córdoba
Antonio R. Franco4.1k wrote:

Depending upon you got these sequence, it is likely that the start and/or the stop codon are missing

BlastX will be able to find a homologous protein sequence based upon the translation of a internal part of your sequence even though it lack the start and stop codon

ADD COMMENTlink written 4.4 years ago by Antonio R. Franco4.1k

How do I find start and stop condon in the fasta file if the sequences have at least one of them?

ADD REPLYlink written 4.4 years ago by grayapply2009170
1

For an individual sequence, you can try services like:

- NCBI ORFFinder

- Try EMBOSS. There are several programs available, in graphic and text mode. EMBOSS will allow you to use a fasta file with many sequences at once. 

ADD REPLYlink written 4.4 years ago by Antonio R. Franco4.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1680 users visited in the last hour