Trimming/masking ambiguous codon positions from nucleotide sequences

0

Entering edit mode

4.0 years ago

john.soghigian • 0

I have a large number of fasta files each containing a single nucleotide sequence, all of which are in frame (but not all of which contain start codons), and some of which contain ambiguous characters (Ns) where the identity of a particular base is unknown due either to poor quality or absent sequence information.

At times, a particular codon may only have a single base that was properly sequenced, such as below, a short example sequence from one such file: ...GTGCTGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAG...

I would like to be able to trim out Ns from all of these sequences, but do so in a "codon-sensitive" way, such that the trimming would either leave the CNN or remove the C with the Ns (ideal). It is trivial for me to remove the Ns, but I am not sure how to handle it given the codons. If it is helpful, I already have the corresponding amino acid sequence.

codon nucleotide fasta trimming masking • 788 views

ADD COMMENT • link 4.0 years ago by john.soghigian • 0

Login before adding your answer.