Question: lncRNA Detection with Tuxedo Suite
13 months ago
27atcggcta27


My question focuses on the usage of Stringtie

I am a student completing an independent study trying to detect novel lncRNA from RNA-Seq data. (I have 4 samples with triplicates, total of 12 fastq files). I'm using HISAT2 for alignment and Stringtie for assembly/abundance estimation. I've already generate GTF files for each triplicate. Currently I'm at the abundance estimation point of my project.

I'm currently implementing the --merge usage of Stringtie. It is my understanding that merge mode will generate a non-redundant GTF file from the GTF files generated from each sample, as well as the reference annotation if included. This new merged GTF file is then used as a new annotation when determining DE. From the Stringtie manual I find that I have the following option

-G <guide_gff> reference annotation to include in the merging (GTF/GFF3)

-m <min_len> minimum input transcript length to include in the merge (default: 50)

-i keep merged transcripts with retained introns (default: these are not kept unless there is strong evidence for them)

So, Since I am trying to detect novel lncRNA, and do not really care about the DE of annotated genes, would it be advisable for me to do the following:

  • set the minimum transcript length to 200, since that is the minimum length of lncRNA? (-m option)
  • to not use an additional reference annotation, because I am only looking for novel lncRNA (-G option)
  • keep all retained introns, because this could lead to non-coding/loss of function characteristic of lncRNA (-i option)

Thank you in advance for any suggestions! I have never post on this site before so please tell me if I need more information or follow more guidelines

note: edited for clarity

modified 12 months ago • written 13 months ago by 27atcggcta27

Hey, this question is relevant for me too! Were you able to come to a conclusion if any of these options are good for such analaysis?

written 10 months ago by c_u
