Question: Trimming/masking ambiguous codon positions from nucleotide sequences
gravatar for john.soghigian
6 months ago by
john.soghigian0 wrote:

I have a large number of fasta files each containing a single nucleotide sequence, all of which are in frame (but not all of which contain start codons), and some of which contain ambiguous characters (Ns) where the identity of a particular base is unknown due either to poor quality or absent sequence information.

At times, a particular codon may only have a single base that was properly sequenced, such as below, a short example sequence from one such file: ...GTGCTGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAG...

I would like to be able to trim out Ns from all of these sequences, but do so in a "codon-sensitive" way, such that the trimming would either leave the CNN or remove the C with the Ns (ideal). It is trivial for me to remove the Ns, but I am not sure how to handle it given the codons. If it is helpful, I already have the corresponding amino acid sequence.

ADD COMMENTlink written 6 months ago by john.soghigian0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1306 users visited in the last hour