Question: how can I get novel long non coding RNA from RNA-seq data?
eli0 wrote:

Hi all, I have analyzied my RNA-Seq data. I have used this tools: Download sequences(SRA) from ncbi database. FastQC (Check quality of sequencing). Trimmomatic(the quality of each raw library is analyzed and sequencing adapters and bad quality reads are removed) I have used paired end datas as input in hisat. I had htseq count. I have used deseq2 package in galaxy to get up and dawn genes. now i dont know how can i get novel lncRNAs?

Need help Thank you in advance

Bergen, Norway
Michael Dondrup47k wrote:

Truly novel transcripts are not part of the annotation, so you haven't counted or encountered any with your htseq-deseq2 pipeline. You could use Trinity in genome-guided assembly mode. Then you could compare the generated transcripts with annotated transcripts, filter for near-identical overlap with existing transcripts, and also annotate them for their coding potential, e.g. with transdecoder+trinotate. Those that come out without a "good" coding region and e.g. no blastx similarity to known proteins in GenBank or without InterPro domains, could be candidates for a list of a) novel transcripts and b) non-coding transcripts.

Which genome is this by the way? For well-annotated model organisms that approach might not yield much.

Thank for your help. The genome I used was zea mays. can i use deseq2 result to figure out which gene ID is coding and which is noncoding?

