Question regarding Busco full_table.tsv report
17 months ago

Hi there. For context, I build a transcriptome for my species, where I first mapped my reads to the genome and then used stringtie to get the transcript sequences. I evaluated using Busco (and the eudicots_odb10 dataset) and used not only the full transcriptome but also a set of sequences containing only the largest sequence from each gene I noticed, however, that when I only used the longest transcript, the fragmented percentage of genes increased a lot:

StringTie transcriptome     eudicots_odb10  97.80%  26.20%  71.60%  0.60%   1.60%   2326(100%)
StringTie transcriptome longest transcript  eudicots_odb10  83.10%  80.00%  3.10%   7.40%   9.50%   2326 (100%)


I wanted to look at full_table.tsv file to understand why that happens, and I noticed for some cases the length collum value is smaller in the longest transcript dataset, which is at first counterintuitive. I´m also having trouble understanding where this "length values" come from, but the documentation on Busco did not help

