Hi
I'm very new to this lncRNA things. I'm using HISAT2, STRINGTIE and BALLGOWN pipeline for differential expression analysis.
If I'm only looking for lncRNA's, I will use the lncRNA annotation from Gencode lncRNA's [ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.long_noncoding_RNAs.gtf.gz]
First Question:
For eg: from this paper [https://www.nature.com/articles/nprot.2016.095]
Hisat2 command: Map the reads for each sample to the reference genome
hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188044_chrX_1.fastq.gz -2 chrX_data/samples/ERR188044_chrX_2.fastq.gz -S ERR188044_chrX.sam
Stringtie command: Assemble transcripts for each sample
stringtie -p 8 -G chrX_data/genes/chrX.gtf -o ERR188044_chrX.gtf –l ERR188044 ERR188044_chrX.bam
In which step I need to use the above gtf file (lncRNA annotation from Gencode lncRNA's) [ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.long_noncoding_RNAs.gtf.gz]
In Hisat2 or Strigtie step?
Second question:
If I want to get both protein coding RNAs and lncRNA's which gtf file should I use from Gencode [http://www.gencodegenes.org/releases/current.html] ?
In this way after differential expression analysis I will be having both differentially expressed protein coding RNA's and lncRNA's. How can I filter only lncRNA's from them?
It would be very helpful if you could clear my doubts. Thank you.