Hi everyone, I'm trying to create a local (custom) database for prokka based on a multifasta file that contains a set of DNA sequences of different bacterial genes. It looks like this
>mecI:1:D86934 TTATTTTTTATTCAATATATTTCTCAATTCTTCTATTTCATCTTGTGATAGATCTTCTTTTTCTACAAAGTTTAAGACAAGTGAATTGAAACCGCCTTTGTATACTTTATTGATAAAGTTTTTAGATGTTTTATATTTTATATCACTTTCTTCTACAAGAGAGTAATATTGAAAAATTTTATTGTCTTTTTTACGATTA >mecI:2:AB037671 TTAAAAAATTTTATTGTCTTTTTTACGATCTATAAATCCCTTTTTATACAATCTCGTTATAAGTGTACGAATGGTTTTTGGACTCCAGTCCTTTTGCATTTGTATTTCTTCTATTATATTATTCGCACTTGCATATTTTTCATCCAAATGATATTCATAACTTCCCATTCTGCAGATGATATTTCATACGTTTTATTATCCAT >mecI:3:FJ670542 TTATTTTTTATTCAATATATTTCTCAATTCTTCTATTTCATCTTGTGATAGATCTTCTTTTTCTACAAAGTTTAAGACAAGTGAATTGAAACCGCCTTTGTATACTTTATTGATAAAGTTTTTAGATGTTTTATATTTTATATCACTTTCTTCTACAAGAGAGTAATATTGAAAAATTTTATTGTCTTTTTTACGATC >mecI:4:FJ390057 ATGGATAATAAAACGTATGAAATATCATCTGCAGAATGGGAAGTTATGAATATCATTTGGATGAAAAAATATGCAAGTGCGAATAATATAATAGAAGAAATACAAATGCAAAAGGACTGGAGTCCAAAAACCATTCGTACACTTATAACGAGATTGTATAAAAAGGGATTTATAGATCGTAAAAAAGACAAT........
I´ve been trying to do it converting first, the inicial DNA multifasta file in to a PROTEIN multifasta file by using EMBOSS - transeq tool, then creating a blast database and indexing it in to the PROKKA database directory, and finally do the typing process of this custom database against a bacterial genome in order to get the gbk output of this process. However, I ask you for help if there is an easier way to do this because I'm loosing some sequence information caused by the EMBOSS-transeq process. May be if there is a way to do the setting the DNA database instead of the protein database.
Thank you guys.