Question: lncRNA differential expression analysis
gravatar for seta
5 months ago by
seta1.4k wrote:

Dear all,

I'm busy with an RNA-seq analysis of case and control samples and got some differentially expressed coding genes and long non-coding RNA (lncRNA) by edgeR. I would like to do an integrative lncRNA-mRNA analysis; the library size of cases is small (about 2 million raw counts) compared to controls (about 60 million raw counts), so I filtered the genes with CPM value of less than 5 during the edgeR analysis. Given that the lncRNAs usually have the low expression value, I'm concerned about the CPM threshold as some lncRNA may miss during the analysis. Could you please share your idea about the analysis?

Thanks a lot

ADD COMMENTlink modified 5 months ago by Kevin Blighe65k • written 5 months ago by seta1.4k
gravatar for Kevin Blighe
5 months ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

The fact is indeed that lncRNAs are lowly expressed, apart from notable exceptions (MALAT1, XIST, TSIX, etc) under certain conditions. Are you more worried about the difference in library size? Neither of us can see your data and try out different filters. However, you could start with CPM > 1 as a minimum cut-off, and take it from there.

It is likely, in my opinion, that many known lncRNAs are merely reflective of 'transcriptional noise' —being expressed as a result of the expression of nearby protein coding genes, for example— and, for all intents and purposes, may have no function other than to occupy volume in the nucleus and cytoplasm, where they will be digested. They are still expressed, though, and using a cut-off of 1 will at least ensure that these are included in your analysis, for better or for worse.

If your approach is ultimately about correlation, then the large library size differences may not have as large affect as you think (because correlation metrics will be independent of it). Again, though, we cannot see the data - I would be checking histograms, box-and-whisker plots, summary statistics, etc.


ADD COMMENTlink written 5 months ago by Kevin Blighe65k

Many thanks, Kevin for your always help. Actually, I'm not concerned about the large library size differences. My issue is the CPM cutoff for the analysis. As far as I know, the read with the count of less than 10 should be usually removed, which is equivalent to CPM of 0.5 for the library size containing 20 million reads. Now, as I mentioned in the post, the library size of my patient samples is about 2 million reads, so I forced to set the high CPM cutoff (5) to filter the low count read (less than 10). But, here, many lncRNAs may miss from the analysis and is indeed my problem. Could you please let me know if you have any suggestions?

ADD REPLYlink modified 5 months ago • written 5 months ago by seta1.4k

Apart from trying different cut-offs, I have no more suggestions.

ADD REPLYlink written 5 months ago by Kevin Blighe65k

Thanks. Sorry, if do you suggest the CPM cutoff of 1 for the library size of 2 million reads? Please kindly let me know what I should look for in the output of different cutoffs?

ADD REPLYlink written 5 months ago by seta1.4k

I still don't know what is your idea for integrating these datasets, which is important to understand; so, I am limited in how I can advise on specifics There is no right or wrong here - you can apply the same cut-off for both, or use a different cut-off. Then proceed with your analysis, with the view that you can always go back and modify certain parameters. Having many low-expressed genes in your dataset will affect things like p-value correction, amount of required RAM, fold-change calculations, PCA, clusterting, etc.

You just have to make an 'executive' decision with your own project, and then move forward with your analysis. Again, you can always later go back to modify things.

ADD REPLYlink written 5 months ago by Kevin Blighe65k

My goal is to do an integrative lncRNA-mRNA analysis to find the lncRNAs and their target genes that related to a given disease as well as to understand the corresponding regulatory role of the lncRNAs. Yes, Kevin I usually go ahead with the analysis and go back to do again. However, consulting with other experienced peoples, like you is always valuable for me.

ADD REPLYlink written 5 months ago by seta1.4k

Oh I know, but how you do that integration is important. Anyway, feel free to ask more questions!

ADD REPLYlink written 5 months ago by Kevin Blighe65k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 834 users visited in the last hour