Hi,
I had some confusion about StringTie. This is very similar/identical to my previous post on this, but I haven't been able to get much further regarding it, and thought of maybe adding a bit more of an example/expanding the explanation. If this is not allowed, pleas let me know and I'll delete this (or the previous) post.
So from what I know so far:
- StringTie can run in "de-novo" mode, which is without the
-G
option. In this case, it will only output novel isoforms. All found isoforms have aSTRG
prefix (alternativelyMSTRG
if doing it withstringtie --merge ...
, which I'm using) - If provided with a
-G
option, it includes all the reference isoforms/transcripts as well (since the reference GTF file would contain the information). This means we haveSTRG
/MSTRG
prefix (i.e. novel) isoforms, and also ones with a gene name e.g. ref_gene_name etc would be the whats belonging to the reference. - From my understanding is that if the
-G
option is not provided, then it won't include any isoforms from the reference set, but also we won't get any annotations regarding e.g.ref_gene_name
/ref_gene_id
.
What I want to do is: Only have novel isoforms, which aren't part of the reference set (i.e. already identified in the reference GTF), but if these novel isoforms are from genic regions (i.e. MMP8) they should be annotated as such (i.e. MMP8-MSTRG1.2
or have the ref_gene_name
field).
In short, this would mean that the output should contain e.g. only new isoform of the genes i.e. MMP8-MSTRG1.2
etc. (not in reference set), or possibly a novel isoform from a non-annotated region (and this would only be labelled with MSTRG1.1
, or have no ref_gene_name
field etc.)
Is this possible, or is my understanding/usage of StringTie incorrect?
Thanks in advance.