After I ran StringTie after Hisat2 on a non-model RNA-seq data set (i.e. Argentine ant), I realised that some StringTie transcripts were assigned to multiple genes (see exemple below) which is causing many problems down the line.
24887 MSTRG.1473 LOC105670921 24920 MSTRG.1473 LOC105670793 25000 MSTRG.1473 LOC105670784 ... 27182 MSTRG.1603 LOC105671758 27194 MSTRG.1603 LOC105671753
Because I could not find any score that would help me to select the best match between a StringTie transcript and assigned genes, my first approach (although knowing it was wrong) was to select the first pair, and discard the others. But I realised that it is a widespread issue in my dataset, so I don't feel comfortable at all doing this.
Another detail that may be useful: I then had a look at what these genes were, they do not seem to be homologous but they seem to be always located on the same genomic region. See for yourself in the exemple below with MSTRG.1473.
Is there any proper way to deal with that?
MSTRG.1473 is simultaneously assigned to these three genes circled in green: