TRANSDECODER v5.0.2 - nucleotide sequences with ambiguity characters S,R,W,K, Y
1
0
Entering edit mode
6.1 years ago

Hello,

I download 30 coding sequences genomes of different species from Ensembl or NCBI. In order to eliminate the transcripts and isoforms I used cd-hitest first and then I passed the files through TRANSDECODER.

In TRANSDECODER v.5.0.2 I did step 1 and 3, but the final file fasta.transdecoder.cds comes out with ambiguous characters like S,R,W,K,Y

I would like to know how I can solve this problem, because I have to run OMAbrowser later on.

Thanks,

Best regards, Daniela

TRANSDECODER • 1.1k views
ADD COMMENT
0
Entering edit mode
6.1 years ago

Those UIPAC code base will probably have been their prior to the TransDecoder step as TransDecoder is not changing anything to the DNA/mRNA sequence itself. You should check that it was not cd-hitEst that introduced them or that they were even already present in the original genomic data you downloaded.

To get rid of them you can simply replace them with Ns , you're loosing information but most downstream software should be able to deal with Ns .

ADD COMMENT

Login before adding your answer.

Traffic: 1217 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6