3 months ago by
I have just spent quite a while trying to figure this out and finally solved the mystery: these odd lines refer to tRNAs and pseudo_tRNAs.
In the descriptions of analysis on the ENCODE website, there is no mention of any such features. I decided to look at the files that ENCODE's pipelines use as input to RSEM to figure out what they were. In the metadata table associated with the files, mine say they used annotation 'M4'. I went to ENCODE's 'Reference Sequences' page and took a look at this M4 annotation, but found that every feature in the file was of the format
It was only when I started digging through random annotation files on the ENCODE portal, such as this example, that I found the association between these values and the tRNAs.
For example, this is a snippet from the above-linked file:
I'm not entirely sure why these features are included in the output files, I suspect that it may be a mistake (if it's not, the analysis descriptions should be made clearer).
So for most analyses where you don't care about tRNAs, I reckon you can just delete the lines. Hope this answer saves some time for future explorers.