Hi good people,
I was using BUSCO (v. 5.3.2 https://busco.ezlab.org/) to extract protein-coding sequences from some full genomes. The lineage file I was using was eutheria_odb10. It went through without error at first. As I was preforming sequence alignment, however, I discovered that some sequences were rejected by the alignment program. They were quite strange. Here's one for example:
ATACACCAAAATGAAGACTGCCACCAACATCTATATTTTCAACCTTGCTCTGGCAGATGCCCTAGCAACCAGTACCCTGCCCTTCCAGAGTGTCAATTACCTAATGGGAACATGGCCCTTTGGAACCATCCTCTGCAAGATTGTGATCTCCATAGATTACTATAATATGTTCACCAGCATATTCACCCTCTGCACCATGAGCATTGATCGCTACATCGCAGTCTGCCATCCCGTCAAGGCCCTGGATTTCCGCACTCCCCGCAATGCCAAGATCGTCAACATCTGCAACTGGATCCTCTCTTCAGCCATTGGTCTGCCTGTGATGTTCATGGCGACAACAAAGTACCGGCAAGGTTCCATAGATTGTACTCTAACATTTTCTCACCCAACCTGGTACTGGGAAAACCTGCTGAAGATCTGTGTTTTCATCTTTGCTTTCATCATGCCCGTCCTCGTCATTACGGTGTGTTACGGACTGATGATCTTACGCCTCAAGAGCGTCCGCGTGCTCTCTGGCTCCAAAGAAAAGGATCGGAACCTGCGAAGAATCACCAGGATGGTGCTGGTGGTTGTGGCTGTGTTCATTGTCTGCTGGACCCCCATTCACATTTACGTCATCGTCAAAGCCTTGATCACAATCCCAGAAACTACTTTCCAGACTGTTTCATGGCACTTCTGCATTGCTCTCGGTTACACAAACAGCTGCCTGAACCCAGTCCTTTATGCGTTTCTGGATGAAAACTTCAAACGATGCTTCAGAGAGTTCTGCATCCCAACGTCCTCCACCATTGAGCAGCAAAACTCCACTAGAATGCGTCAGAACACCAGAGACCTCCCCTCCACGGCCAACACAGTGGATAGGACTAACCATCAGAAATTCAGTGGAACAAATAACCTTTCAAATGGCTACACTGCAAGTAAATATCAACATCTAAATCCCAATAATGCGATTGGATTTATCAAGAAGATGAAAAATATTCACAGTTCTTAG
I was confused as to why BUSCO considered it a protein-coding sequence at all because it doesn't look like one. Therefore I don't know if I should try salvage those sequences or not. Anyone has any ideas or some possible ways to correct this behaviour?
Thanks.
I tried aligning with TranslatorX. Mafft method did produce an alignment with seemingly missing start codon. I'll talk to my boss to see what to do with them. Thanks.