lncRNA analysis
0
1
Entering edit mode
6 months ago
Carly ▴ 10

Hello,

I have a bulk RNAseq dataset that I have analyzed using STAR and limma. We were able to pull out some interesting trends based on the results. I decided to try and do the analysis with lncRNA to see if I could pull out any DEGs between our different treatments. The pipeline I followed was:

1. STAR alignment -using the full GENCODE fasta and gtf files
2. qualimap -to check the alignment
3. featureCounts -this time using the lncRNA gtf from GENCODE
4. multiqc -the results look reasonable -end up pulling out about 2-3% of the genome with this gtf, but that makes sense given that it is just lncRNA
5. limma -filtering and normalization -end up with ~3000 transcripts
6. limma -DE with voom

My questions are: (1) Does this pipeline make sense? Specifically how I pulled out lncRNA. There is very little guidance on how to do this (2) I end up with a lot of the same trends with the lncRNA as with the mRNA analysis. For example, I get the same patterns of MDS clustering and unsupervised hierarchical structuring. Is this normal? I didn't expect to see the same trends with just the lncRNA

I guess overall, I feel confident in the pipeline used above for the mRNA analysis but unsure if it is correct for lncRNA. Any help would be appreciated.

Thanks,

Carly

lncRNA RNAseq • 402 views
1
Entering edit mode

Why not doing both at the same time? Just consider lncRNA from the DE results. Calling just lncRNA with featurecounts changes the way you're considering the library size in limma from the results table.

0
Entering edit mode

Agreed. I would always keep workflows as generic as possible, so always quantify all annotated genes/features and then later subset. In fact I think that the DE calling benefits from having more rather then fewer genes because normalization is more adequate and the dispersion estimation as well. What you can do (e.g. when using edgeR or DESeq2) is to subset to onlc the lncRNAs before the mutiple testing correction. In DESeq2 that is before calling results() and in edgeR it's before topTags() (depending on the testing type you use).

0
Entering edit mode

Ok, thanks for the response- that makes sense to me.

What is the best way to go about this? Is it best to use the list of lncRNA gene names to subset the filtered and normalized count data or the eBayes data?