Hi there. For context, I build a transcriptome for my species, where I first mapped my reads to the genome and then used stringtie to get the transcript sequences. I evaluated using Busco (and the eudicots_odb10 dataset) and used not only the full transcriptome but also a set of sequences containing only the largest sequence from each gene I noticed, however, that when I only used the longest transcript, the fragmented percentage of genes increased a lot:
StringTie transcriptome eudicots_odb10 97.80% 26.20% 71.60% 0.60% 1.60% 2326(100%) StringTie transcriptome longest transcript eudicots_odb10 83.10% 80.00% 3.10% 7.40% 9.50% 2326 (100%)
I wanted to look at full_table.tsv file to understand why that happens, and I noticed for some cases the length collum value is smaller in the longest transcript dataset, which is at first counterintuitive. I´m also having trouble understanding where this "length values" come from, but the documentation on Busco did not help
Thanks in advance!