3 months ago
gilsorek12 • 0

I used minimap2 to align a de novo transcriptome file to a reference genome.

With samtools I converted the minimap2 output to bed and wrote my own script to create the gff which will be provided to maker est_gff

According to this maker-devel topic: The alignment gff from minimap2 needs to follow the alignment format used by GFF3 (i.e. match/match part)

I run three sample tests of maker to check the gff I created from minimap2:

  1. I include protein sequences (fasta) and mRNA sequences without transcriptome to use as a reference for test no. 2
  2. I Include both protein and mRNA sequences and provided the est_gff that was created from minimap2
  3. I Include all sequences (proteins, mRNA, transcriptome) and let maker use BLAST for all alignments.

When I compared the final gff files from tests 1 & 2 the results were identical. I checked the presence of est_gff input in test 2 and the file did contain alignments from minimap2:

scaffold15014-5 est_gff:minimap2    expressed_sequence_match    275244  275456  1000    +   .   ID=scaffold15014-5:hit:10067:;Name=TRINITY_DN110156_c0_g2_i1;score=1000
scaffold15014-5 est_gff:minimap2    match_part  275244  275456  1000    +   .   ID=scaffold15014-5:hsp:16084:;Parent=scaffold15014-5:hit:10067:;Target=TRINITY_DN110156_c0_g2_i1 1 213 +;Gap=M213

I think it means that maker did not reject the format I provided, but for some reason he did not use it to provide the hints based annotation predictions. The minimap2 gff I provided to maker est_gff looks like:

scaffold15014-5 minimap2    expressed_sequence_match    103440  103740  1000    +   .   ID=scaffold15014-5:TRINITY_DN55863_c2_g1_i1:hit:103440-103740;Name=TRINITY_DN55863_c2_g1_i1
scaffold15014-5 minimap2    match_part  103440  103595  1000    +   .   ID=scaffold15014-5:TRINITY_DN55863_c2_g1_i1:hit:103440-103740:hsp:1;Parent=scaffold15014-5:TRINITY_DN55863_c2_g1_i1:hit:103440-103740;Target=TRINITY_DN55863_c2_g1_i1 1 156;
scaffold15014-5 minimap2    match_part  103635  103740  1000    +   .   ID=scaffold15014-5:TRINITY_DN55863_c2_g1_i1:hit:103440-103740:hsp:2;Parent=scaffold15014-5:TRINITY_DN55863_c2_g1_i1:hit:103440-103740;Target=TRINITY_DN55863_c2_g1_i1 157 262;

Thanks for consideration and help.

maker minimap2 annotation alignment • 192 views
3 months ago
Juke34 ★ 5.8k

Your GFF format is wrong there are problems in the parent/ID relationships. I advise you to use from AGAT to create your gff file.

Thank you! The results were much better using AGAT.


