Entering edit mode
10.5 years ago
Adrian Pelin
★
2.6k
Hello,
I have a draft assembly. Does anyone know of scripts to retrieve all ORF in protein format from a a fasta file of contigs?
Adrian
Unless this is a prokaryote, getting all open reading frames from a draft assembly is not an informative analysis (splicing, low gene density), and also for bacteria it is of very limited use. Instead, look for gene prediction , e.g. on BioStar: gene-prediction
I work on microsporidia. They have very little introns (up to 20 genes with introns, some none at all) and small genomes. They are Eukaryotes.
Then you can use getorf as suggested by R@hul, you should still attempt to do a proper gene prediction.
Please share the fully functional perl script to translate CDNA to ORF (protein) selecting the longest one only. I have Active Perl installed.