Question: How to do differential expression of circular RNAs?
2
gravatar for BehMah
2.4 years ago by
BehMah30
BehMah30 wrote:

Following RNA-seq data analysis, I've got a list of circRNAs and their RPM (Read Per M mapped) values for 10 samples (5 case, 5 control) for hsa and wish to analyse differential expression (case vs. cont) using Wilcoxon rank-sum test (non-parametric). Can anybody tell me the right tool and the right code/command?

rna-seq • 1.6k views
ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by BehMah30
2

I would advice not to do this, but use raw count data instead and go for limma or edgeR analysis.

ADD REPLYlink written 2.4 years ago by Benn7.5k

I have to use mapping information thats why I am using RPM values

ADD REPLYlink written 2.4 years ago by BehMah30

So you mean you don't have the original data?

ADD REPLYlink written 2.4 years ago by WouterDeCoster40k

I do but I have to use RPM as recommended by publications as my experiment is circRNA transcripts and I should normalise them to the mapped reads for a better analysis in this case

ADD REPLYlink written 2.4 years ago by BehMah30
1

Then why didn't you say that from the beginning? We can't read your mind.

ADD REPLYlink written 2.4 years ago by WouterDeCoster40k

I thought it wouldn't matter :) sorry about that. a bit new in bioinformatics :(

ADD REPLYlink written 2.4 years ago by BehMah30

It's better to add too much information than too little. Now you asked your question 6 hours ago, and we have just figured out what you are working on. Meanwhile, this question is going lower and lower on the forum, possibly escaping the view of people who can help you.

So I advise you to update your post, change the title to reflect "differential expression of circular RNAs". Don't forget to mention stuff like the organism you are working on, the number of samples, the data you have. Perhaps also add a reference to a publication suggesting you to use RPM.

Editing your first post will bump the post back to the top of the list, which is, in this case, convenient because you need some attention for your new information. (But don't abuse this feature...)

ADD REPLYlink written 2.4 years ago by WouterDeCoster40k

Thanks WouterDeCoster :)

ADD REPLYlink written 2.4 years ago by BehMah30
1

Have you seen this answer by @Kanne? Additionally, what publication recommended RPM for DE, I'm curious? I'd generally agree with @WouterDeCoster that DESeq2's library size normalisation strategy makes sense.

ADD REPLYlink written 2.4 years ago by andrew.j.skelton735.8k

What makes you think your approach is a valid approach to perform differential expression analysis? Unless you have good reasons, this bioconductor workflow is your starting point for every typical analysis.

ADD REPLYlink written 2.4 years ago by WouterDeCoster40k

I already used DESeq analysis with raw counts but didnt give me logical results in my case as the gene IDs are non-conig cicRNAs not actual genes. The literature used RPM normalisation for differential expression

ADD REPLYlink written 2.4 years ago by BehMah30
0
gravatar for BehMah
2.4 years ago by
BehMah30
BehMah30 wrote:

Any view of how to do analysis using RPM?

ADD COMMENTlink written 2.4 years ago by BehMah30
2

If I remember the circRNA literature correctly, the RPMs are used (and useful) to determine expression levels of circRNAs vs linear. It is a very specific case also because of the nature of circRNAs, the only reads used tend to be those of the splice junctions unique (or not) to the circRNAs. To be honest, I don't even remember which paper used RPMs other than to establish the expression of the linear form (if any).

With that out of the way, for DGE, use counts. It is the best method around, and it should work with circRNAs. Do look out for weird fit to the data and/or low counts which might happen when using backsplice reads only.

I already used DESeq analysis with raw counts but didnt give me logical results in my case as the gene IDs are non-conig cicRNAs not actual genes.

This is an issue of annotations, not of the DGE method. DGE tools are agnostic to the nature of the annotations - coding, non-coding, intergenic reads etc. You will need to overlap circRNA junctions with a gene annotation to get the host gene name, but this is independent.

The literature used RPM normalisation for differential expression

Could you please provide a reference?

ADD REPLYlink written 2.4 years ago by A. Domingues2.1k

Also, it is bad form in a Q&A site such as Biostar, to submit a comment/request as "answer".

ADD REPLYlink written 2.4 years ago by A. Domingues2.1k
0
gravatar for BehMah
2.4 years ago by
BehMah30
BehMah30 wrote:

The publications used RPM for normalization include: "Complementary sequence-mediated exon circularization" and also RPM used for miRNA normalization: PMID: 24625073 and PMID: 26538400

ADD COMMENTlink written 2.4 years ago by BehMah30
1

Oh yes, I had forgotten about Zhang et al. However, you do have a different setting (better in my opinion), in that you have replicates, and I am not sure a Wilcoxon rank-sum test is the most appropriate for your data. I will let a statistician help you with that one.

The other 2 references, as far as I can tell, one advocates for DEG methods based on counts, and the other is comparing miRNA quantification (not DGE) between microarrays and HTS, so not sure what the relevance is for your question. Again, for DEG, and unless there is a good case against it, use counts.

ADD REPLYlink written 2.4 years ago by A. Domingues2.1k

Thanks a lot A.Domingues :)

ADD REPLYlink written 2.4 years ago by BehMah30

Please use ADD COMMENT or ADD REPLY to answer to earlier reactions, as such this thread remains logically structured and easy to follow. "Adding an answer" is only for answers to the original question.

ADD REPLYlink written 2.4 years ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1357 users visited in the last hour