Trimming/masking ambiguous codon positions from nucleotide sequences
0
0
Entering edit mode
4.0 years ago

I have a large number of fasta files each containing a single nucleotide sequence, all of which are in frame (but not all of which contain start codons), and some of which contain ambiguous characters (Ns) where the identity of a particular base is unknown due either to poor quality or absent sequence information.

At times, a particular codon may only have a single base that was properly sequenced, such as below, a short example sequence from one such file: ...GTGCTGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAG...

I would like to be able to trim out Ns from all of these sequences, but do so in a "codon-sensitive" way, such that the trimming would either leave the CNN or remove the C with the Ns (ideal). It is trivial for me to remove the Ns, but I am not sure how to handle it given the codons. If it is helpful, I already have the corresponding amino acid sequence.

codon nucleotide fasta trimming masking • 788 views
ADD COMMENT

Login before adding your answer.

Traffic: 2004 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6