Question: microRNA-seq analysis from RNA-seq data
0
gravatar for seta
7 months ago by
seta1.4k
Sweden
seta1.4k wrote:

Dear all,

I've got some RNA-seq data that mRNAs were purified using oligo-dt and sequenced via the MGI-SEQ-2000 platform. Also, some RNA-seq data produced by Illumina NovaSeq and Hiseq platforms. I'm going to analyze microRNA and long non-coding RNAs from these datasets, however, I'm not sure as the data didn't specifically generate for this kind of RNAs (miRNA and long non-coding RNA). Could you please let me know your idea? Is there any tool/pipeline for such cases?

Thanks

ADD COMMENTlink modified 7 months ago by rob.costa1234250 • written 7 months ago by seta1.4k

You will be better of doing small RNA seq separately to analyze micro or other small RNA

ADD REPLYlink written 7 months ago by rob.costa1234250
3
gravatar for i.sudbery
7 months ago by
i.sudbery9.8k
Sheffield, UK
i.sudbery9.8k wrote:

You will struggle to analyse miRNAs from this data, as miRNAs do not have poly-A tails, and will not be selected by the oligo-dt. They are also generally too small, as most RNA-seq protocols size-select for things larger than 200bp.

You will have better luck with lincRNAs, many of which (although by no means all) have a poly-A tail, and are by definition, longer than 200bp.

For these lincRNAs, you analyse them no differently than mRNA.

ADD COMMENTlink written 7 months ago by i.sudbery9.8k

Thank you for your explanation.

I have separated the count of lincRNA from other RNAs. Considering the low count of this type of RNAs (that is usually due to their low expression, am I right?), could you please share me any advice for doing differential expression analysis? Is the edgeR package suitable for the analysis? For mRNA analysis, the count lower than 10 is usually filtered before doing the analysis, which count threshold do you suggest for lincRNA differential analysis?

Thanks

ADD REPLYlink written 7 months ago by seta1.4k

edgeR is fine. I would recommend not removing any low expression genes, other than those that are 0 in all samples.

I'd also not recommend seperating the lncRNAs and the mRNAs at the start. Analysise them together to begin with. Then once you've finished the analysis subset for the lncRNA rows, and recalculate the padj values using results$padj <- p.adjust(results$pvalue, method = "BH")

ADD REPLYlink written 7 months ago by i.sudbery9.8k

Thank you. I separated the count as I read here that because of the low expression of lncRNAs, doing normalization along with coding genes (mRNA) has the major bias towards highly expressed mRNA. So, it isn't in your idea?

ADD REPLYlink written 7 months ago by seta1.4k

The more highly expressed mRNAs will have an oversized effect on the normalisaiton. But this is exactly why they should be included. Changes in levels of higly expressed mRNAs will affect the sequencing real estate available for estimating lncRNA expression, and therefore should be accounted for.

ADD REPLYlink written 7 months ago by i.sudbery9.8k

Thanks a lot. Just making sure, it should be analyzed lncRNA and mRNA together if even we would like to focus on lncRNAs, am I right?

ADD REPLYlink written 7 months ago by seta1.4k

That would be my recommendation. But make sure you reclacluate the adjusted pvalues (FDRs/qvalues) after subsetting.

ADD REPLYlink written 7 months ago by i.sudbery9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1050 users visited in the last hour