Question: Methods to identify ORF from de novo assembly
3.2 years ago
I have recently completed a de novo assembly for an organism whose genome is currently unknown. We are working to classify all the sequences, but would like to decode the ORFs so we can tie our data to data from MS. I have previosuly attempted to use transdecoder w/ homology search and emboss tools with not very accurate results. We have run the data against BLAST and obtained a high level of accurate results so we know our assembly is well put together. Any suggestions would be appreciated.

Was it a genome or transcriptome assembly? Assuming transcripts, if you have better results with blast than transdecoder probably it can be due to the nucleotide errors from assembly. Why not mark those coordinates and call SNPs for these regions?

That is an interesting thought and yes we used transcripts. I will have to try that on one of the sequences of better known characterization. It may also be worth attempting to regenerate the assembly using another assembler and compare the output.

