I'm a new user of Maker and I'm seeking assistance with the protocol I'm using. Currently, I'm working on annotating the genome of a non-model ascomycete fungal species belonging to the Sporocadaceae family.
After running the analysis with Maker, I obtained FASTA and GFF files using fasta_merge and gff_merge, respectively. Following this step, I renamed my outputs using maker_map_ids (for .all.id.map), map_gff_ids for GFF files, and map_fasta_ids for both protein and transcript FASTA files.
Now, I'm at the stage where I want to blast the predicted proteins against a database, and here arise my initial questions:
Can I download the database from UniProt? Should I download the entire SwissProt protein set, or just SwissProt with the Fungi category selected? Or should I download only Fungi, but all fungal proteins available on UniProt? Despite these uncertainties, I decided to move forward to ensure that my pipeline works correctly. I performed a blastp analysis of the proteins obtained after the Maker analysis, using a protein database downloaded from SwissProt (Fungi category only).
I used this command:
blastp -db swissprot_fungi.fasta -query MYGEN.proteins.fasta -outfmt 5 -evalue 1e-5 -out MYGEN.proteins.xml -num_alignments 5 -num_threads 24
At this point, my second question arises. Is there a Python code or a similar tool that can combine the two files, MYGEN.proteins.fasta and MYGEN.proteins.xml, and return a GFF3 file?
In other words, I'm looking for code that does something like this:
python blast2annot.py -i MYGEN.proteins.fasta -b MYGEN.proteins.xml
Thank you very much for your help.