I'm using StringTie with Ensembl annotations (GTF-file downloaded from Ensembl FTP --> Gene sets --> GTF) and I'm having an issue with exon variants with slightly different genomic positions. Some exons have start positions that differ with as low as 1bp (e.g. one starts at 1001, another starts at 1002), and the same with the stop-positions. As a result, StringTie gives me two different coverage values, one for each of the exons. I would like to treat two such exons as one and the same, and I'm wondering how to go about it.
I can't find a suitable option in the StringTie manual, so I'm considering altering the annotation; something like finding exons with very small differences in start- or stop-positions, and keep only those with the lowest start position and highest stop-position, and re-run StringTie with the new annotation. Is there something obviously flawed with this approach?
Does anyone know of a way to either:
- Make StringTie treat almost-identical exons as one and the same exon, or
- Change the annotation to only contain the longest variant of each exon?