Trimming Augustus coding sequences to first reading frame
0
0
Entering edit mode
5.3 years ago

Hi,

I am attempting to carry out genome-wide Dn/Ds analyses, and have predicted coding sequences and proteins from two species using Augustus. My problem is that the Augustus coding and protein sequences do not completely match up, because some of the coding sequences from incomplete genes carry extra sequence information at either their 5' or 3' ends. For instance, a gene that is incomplete at the 5' end may have one or two extra nucleotides prior to the start of the ORF that make it so that the protein is not encoded in the first reading frame. The opposite problem is true if the 3' end is incomplete: often, there is an extra 1-2 nucleotides. The Dn/Ds programs that I have been trying to use do not like this!

Does anyone have any ideas how I can trim the Augustus coding sequence predictions so that they're all contained within the first reading frame? I noticed that the Augustus getAnno.pl file has a flag called '--chop_cds' that seems like it should work, but I've found that it doesn't do what I want.

Thanks for your help! Ryan

augustus reading frame • 1.1k views
ADD COMMENT
0
Entering edit mode

With agat_sp_extract_sequences.pl from AGAT you can extract the CDS and decide if you clip or not the first base(s) to start the sequence in the frame. It is maybe what you are looking for.

ADD REPLY
0
Entering edit mode

Thanks Juke34. AGAT seems ideal from the perspective of trimming the offset 5' nucleotides, but does it have the capacity to trim off 3' bases that aren't in frame? I realize that I could probably script this, but it would be great to have a tool that does this too!

ADD REPLY
0
Entering edit mode

No it does not. But once 5’ is trimmed you can probably find a simple command to clean the 3’ side. Indeed using a modulo 3 you will know how many nucleotide you must remove at the end of the sequence.

ADD REPLY

Login before adding your answer.

Traffic: 3888 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6