Question: How to deal with single transcripts assigned to multiple genes?
0
gravatar for antoinefelden
14 months ago by
antoinefelden30 wrote:

After I ran StringTie after Hisat2 on a non-model RNA-seq data set (i.e. Argentine ant), I realised that some StringTie transcripts were assigned to multiple genes (see exemple below) which is causing many problems down the line.

24887   MSTRG.1473 LOC105670921
24920   MSTRG.1473 LOC105670793
25000   MSTRG.1473 LOC105670784
...
27182   MSTRG.1603 LOC105671758
27194   MSTRG.1603 LOC105671753

Because I could not find any score that would help me to select the best match between a StringTie transcript and assigned genes, my first approach (although knowing it was wrong) was to select the first pair, and discard the others. But I realised that it is a widespread issue in my dataset, so I don't feel comfortable at all doing this.

Another detail that may be useful: I then had a look at what these genes were, they do not seem to be homologous but they seem to be always located on the same genomic region. See for yourself in the exemple below with MSTRG.1473.

Is there any proper way to deal with that?

MSTRG.1473 is simultaneously assigned to these three genes circled in green:

hisat2 rna-seq stringtie • 457 views
ADD COMMENTlink modified 14 months ago by RamRS24k • written 14 months ago by antoinefelden30
3
gravatar for antoinefelden
14 months ago by
antoinefelden30 wrote:

Okay, I solved that problem, which was not really one but in fact a feature of StringTie.

See https://github.com/gpertea/stringtie/issues/170

In a nutshell, there is an alternative - simpler - StringTie pipeline that skip the assembly step. So what this does is simply to map the reads, without looking for novel transcripts (that matched several loci in the case discussed above). It's quick and dirty because it's discarding a lot of potentially interesting data, but that's what I wanted.

ADD COMMENTlink written 14 months ago by antoinefelden30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2102 users visited in the last hour