Question

RNA-seq Data Normalisation methods on samples associated with a global modification

1

Entering edit mode

9.2 years ago

tiphaine ▴ 10

Dear all,

I would like to use the same method to normalise and analyis the differential expression for not only RNA-seq data but also for other sequencing count data such as 16S and epigenetics data.

But if I look at the high level of my omic data, it seems that there is a global modification between healthy samples and diseases

I have 2 questions.

Can we use the normalisation methods that we find in DESEq, edgeR and Limma, ... on a such data (global modification between 2 groups)? because I understood that these methods are bases on the hypothesis that most genetic elements are not DE, is it right?. it seems that it is not the case in this type of omic data. If it is not possible to use them, do you have an idea to normalise data?
for Epigenetics data, the regions that I used for read count are not independant between each other and not related to known genetic elements. Currently, each chromosome is splitted into 500nt bins with a overlapping of 250nt. I understood that the counting rules to use DESeq/edgeR/Limma... is that only reads mapped on unique place are kept. In this case, it doesn't work because each read is count at least in 2 different bins. Can I use DESeq/edgeR/Limma?

Regards,
Tiphaine

edgeR count-data global-modification limma DESeq2 • 2.2k views

ADD COMMENT • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by tiphaine ▴ 10

0

Entering edit mode

Thank Devon,

Unfortunately (fortunately depending on the point of view), I have a mostly unidirectional global shift and it is on tissue and not on single cell-type. I am going to look at ERCC spike-ins and if it is possible to apply for other omic data. I hope it is not so hard to use deal with, if not, I need to find a collaborator that can help me.
Ok I am going to look at DEXseq.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by tiphaine ▴ 10

0

Entering edit mode

You might find this paper interesting. This is the beginning of the c-Myc story that I alluded to in my answer. I suspect that some of the problems that you're going to run into will be things encountered by them, so perhaps you'll get some useful ideas from the approach that they took.

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks, I run to read it!

ADD REPLY • link updated 2.0 years ago by Ram 43k • written 9.2 years ago by tiphaine ▴ 10

Ram · Answer 1 · 2015-02-17

It depends. If the global modification is mostly in a single direction (e.g., there's transcriptional amplification), then all of the normalization methods used by the tools you listed will break. Well, they still produce results, but they'll be wrong. There's actually a pretty cool story about this involving c-Myc. If most of your genes/whatever are DE but you have a rough balance in direction and magnitude of fold-change, then these methods will still work. In any case, you want to know what to do if you do have a (mostly) unidirectional global shift. The answer is that you need something else to normalize to. In RNAseq, it's popular to use ERCC spike-ins. So you end up normalizing the spike-ins and then applying those normalization factors to the actual data that you're interested in. A similar method would be used for any sort of omics dataset. Keep in mind that this may not be a trivial procedure. For example, ERCC spike-ins make a lot more sense in single-cell data than with bulk tissue (otherwise, you also have to deal with cell number and/or volume). In short, this can get very complicated very quickly. Putting a lot of thought into how to proceed is a wise choice.
This is actually how DEXseq works and it uses DESeq2 internally. I also recall reading that limma and/or edgeR now provides methods for either exon or transcript level analyses. I obviously don't recall the details there, but you should be able to find more about that by flipping through the current manual.

Actually, here is a talk by Mark Robinson that uses exon-level metrics for statistics using limma::voom(). That's a pretty good indicator that combined with DEXseq suggests that you thinking about proceeding in a reasonable way.