Question

microRNA-seq analysis from RNA-seq data

0

Entering edit mode

4.0 years ago

seta ★ 1.9k

Dear all,

I've got some RNA-seq data that mRNAs were purified using oligo-dt and sequenced via the MGI-SEQ-2000 platform. Also, some RNA-seq data produced by Illumina NovaSeq and Hiseq platforms. I'm going to analyze microRNA and long non-coding RNAs from these datasets, however, I'm not sure as the data didn't specifically generate for this kind of RNAs (miRNA and long non-coding RNA). Could you please let me know your idea? Is there any tool/pipeline for such cases?

Thanks

rna-seq small rna microrna long non-coding rna • 1.3k views

ADD COMMENT • link updated 4.0 years ago by rob.costa1234 ▴ 310 • written 4.0 years ago by seta ★ 1.9k

0

Entering edit mode

You will be better of doing small RNA seq separately to analyze micro or other small RNA

ADD REPLY • link 4.0 years ago by rob.costa1234 ▴ 310

score 3 · Answer 1 · 2020-04-12

3

Entering edit mode

4.0 years ago

i.sudbery 19k

You will struggle to analyse miRNAs from this data, as miRNAs do not have poly-A tails, and will not be selected by the oligo-dt. They are also generally too small, as most RNA-seq protocols size-select for things larger than 200bp.

You will have better luck with lincRNAs, many of which (although by no means all) have a poly-A tail, and are by definition, longer than 200bp.

For these lincRNAs, you analyse them no differently than mRNA.

ADD COMMENT • link 4.0 years ago by i.sudbery 19k

0

Entering edit mode

Thank you for your explanation.

I have separated the count of lincRNA from other RNAs. Considering the low count of this type of RNAs (that is usually due to their low expression, am I right?), could you please share me any advice for doing differential expression analysis? Is the edgeR package suitable for the analysis? For mRNA analysis, the count lower than 10 is usually filtered before doing the analysis, which count threshold do you suggest for lincRNA differential analysis?

Thanks

ADD REPLY • link 4.0 years ago by seta ★ 1.9k

0

Entering edit mode

edgeR is fine. I would recommend not removing any low expression genes, other than those that are 0 in all samples.

I'd also not recommend seperating the lncRNAs and the mRNAs at the start. Analysise them together to begin with. Then once you've finished the analysis subset for the lncRNA rows, and recalculate the padj values using results$padj <- p.adjust(results$pvalue, method = "BH")

ADD REPLY • link 4.0 years ago by i.sudbery 19k

0

Entering edit mode

Thank you. I separated the count as I read here that because of the low expression of lncRNAs, doing normalization along with coding genes (mRNA) has the major bias towards highly expressed mRNA. So, it isn't in your idea?

ADD REPLY • link 4.0 years ago by seta ★ 1.9k

0

Entering edit mode

The more highly expressed mRNAs will have an oversized effect on the normalisaiton. But this is exactly why they should be included. Changes in levels of higly expressed mRNAs will affect the sequencing real estate available for estimating lncRNA expression, and therefore should be accounted for.

ADD REPLY • link 4.0 years ago by i.sudbery 19k

0

Entering edit mode

Thanks a lot. Just making sure, it should be analyzed lncRNA and mRNA together if even we would like to focus on lncRNAs, am I right?

ADD REPLY • link 4.0 years ago by seta ★ 1.9k

0

Entering edit mode

That would be my recommendation. But make sure you reclacluate the adjusted pvalues (FDRs/qvalues) after subsetting.

ADD REPLY • link 4.0 years ago by i.sudbery 19k