Question: Identifying differentially expressed lncRNA's from RNA-Seq data
1
gravatar for Bioinfo
4 days ago by
Bioinfo20
Bioinfo20 wrote:

Hi

I'm very new to this lncRNA things. I'm using HISAT2, STRINGTIE and BALLGOWN pipeline for differential expression analysis.

If I'm only looking for lncRNA's, I will use the lncRNA annotation from Gencode lncRNA's [ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.long_noncoding_RNAs.gtf.gz]

First Question:

For eg: from this paper [https://www.nature.com/articles/nprot.2016.095]

Hisat2 command: Map the reads for each sample to the reference genome

hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188044_chrX_1.fastq.gz -2 chrX_data/samples/ERR188044_chrX_2.fastq.gz -S ERR188044_chrX.sam

Stringtie command: Assemble transcripts for each sample

stringtie -p 8 -G chrX_data/genes/chrX.gtf -o ERR188044_chrX.gtf –l ERR188044 ERR188044_chrX.bam

In which step I need to use the above gtf file (lncRNA annotation from Gencode lncRNA's) [ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.long_noncoding_RNAs.gtf.gz]

In Hisat2 or Strigtie step?

Second question:

If I want to get both protein coding RNAs and lncRNA's which gtf file should I use from Gencode [http://www.gencodegenes.org/releases/current.html] ?

In this way after differential expression analysis I will be having both differentially expressed protein coding RNA's and lncRNA's. How can I filter only lncRNA's from them?

It would be very helpful if you could clear my doubts. Thank you.

lncrna rna-seq ngs • 134 views
ADD COMMENTlink modified 4 days ago by Devon Ryan73k • written 4 days ago by Bioinfo20
0
gravatar for Devon Ryan
4 days ago by
Devon Ryan73k
Freiburg, Germany
Devon Ryan73k wrote:

Use the GTF file with hisat2 (unless you built the index with those splice sites already) and skip stringtie entirely. You're not looking for novel lncRNAs, so this doesn't benefit you. Instead, use featureCounts and then DESeq2, edgeR, or limma/voom.

Note that you end up having to modify the GTF file to get the splice sites for hisat2. This is done with hisat2_extract_splice_sites.py, which comes with hisat2.

ADD COMMENTlink modified 4 days ago • written 4 days ago by Devon Ryan73k

No I am also looking for novel lncRnas. Just to know how the command should be given if I need to use gtf file with hisat2?

ADD REPLYlink written 4 days ago by Bioinfo20

If you're looking for novel lncRNAs then you'll need the GTF for both (unless you built the hisat index with it, in which case you only need it with stringTie).

ADD REPLYlink written 4 days ago by Devon Ryan73k

You mean for hisat2 I need to create index files of Grch38 and the gtf file will be used for stringtie. Am I right ? How can I create index files?

ADD REPLYlink written 4 days ago by Bioinfo20
1

I'll try this one last time. Your options are below:

  1. Build a hisat2 index with the splice sites from the GTF file (see the hisat2 documentation for details).
  2. Build or download a genomic hisat2 index and then use the --splice-sites option, providing the aforementioned splice sites from the GTF file (again, see the hisat2 documentation).

Regardless, you'll need to use the GTF file with stringTie.

ADD REPLYlink written 4 days ago by Devon Ryan73k

Got it. Thankyou!! Could you also answer my second question which is mentioned above please.

ADD REPLYlink written 4 days ago by Bioinfo20
1

Use one of the "comprehensive gene annotation" files (either PRI or CHR, depending on which genome you downloaded). Don't use the ALL regions GTF or fasta file (I prefer PRI).

ADD REPLYlink written 4 days ago by Devon Ryan73k

Ok but how can I detect differential expressed lncRnas if the gtf file is with pcRna and lncRnas?

ADD REPLYlink written 4 days ago by Bioinfo20

For that analysis only do the quantification with the lncRNAs. Only use stringTie to find novel genes, not to quantify them.

ADD REPLYlink written 4 days ago by Devon Ryan73k

Hi Devon,

In this paper [https://www.nature.com/articles/ncomms14421] please check Figure A: After transcript assembly and merging they examined how the transcripts compare with the reference annotation. So for this I should Genecode annotation gtf file which I will be using for stringtie. From this we can assign pc and lncRNAs. Unannotated genes will be used for detecting novel lncRNAs. Am I right? In my case as I'm using Hisat2 and Stringtie pipeline I will use gffcompare [which is mentioned in the paper https://www.nature.com/articles/nprot.2016.095].

ADD REPLYlink modified 4 days ago • written 4 days ago by Bioinfo90

Sounds correct, I don't have time at the moment to thoroughly go through that paper.

ADD REPLYlink written 4 days ago by Devon Ryan73k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1497 users visited in the last hour